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The  bottleneck  of  building  expert  systems  is  knowledge  acquisition,  and  one  long-range  solution  is 
for  the  program  to  learn  via  discovery.  New  domains  of  knowledge  can  be  developed  by  using 
heuristics,  yet  as  they  emerge  new  heuristics  arc  needed,  'ITicy  in  turn  can  be  discovered  by  using  a 
body  of  heuristics  for  guidance.  How  exactly  docs  this  process  work?  Must  there  be  a  separate 
body  of  "mcta-hcuristics"?  How  intertwined  arc  heuristics  with  Representation  of  knowledge?  In 
trying  to  find  new  heuristics,  is  it  cost-effective  to  try  to  improve  die  existing  representation  of 
knowledge,  and  if  so  how  can  this  be  automated?  What  is  the  nature  of  heuristics,  their  "first-order 
theory"?  What  are  the  implications  of  such  a  theory  upon  the  design  of  a  program  which  discovers 
new  heuristics?  These  questions  arc  among  diosc  that  our  research  --  and  diis  paper  -  address. 


1 .  MOTIVATION  ^ 

Several  recent  programs  in  Artificial  Intelligence  (AI)  perform  complex  tasks  demanding  a  large 
corpus  of  expert  knowledge  [Keigenbaum  77].  Consider,  for  example,  die  PROSPECTOR  program  for 
evaluating  the  mineral  potential  of  a  site,  the  MYCt.N  program  for  medical  diagnosis,  and  die 
MOt.GEN  program  for  planning  experiments  in  molecular  genetics.  To  construct  such  a  system,  a 
knowledge  engineer  talks  to  a  human  expert,  extracts  domain-specific  knowledge,  and  adds  it  to  a 
growing  knowledge-  base  usable  by  a  computer  program  (see  Fig.  1).  The  critical  stage  of  this 
process,  the  limiting  step,  is  the  transfer  of  expertise.  From  the  program’s  point  of  view,  the 
limitation  is  the  slow  rate  at  which  it  acquires  knowledge.  This  is  the  central  problem  facing 
knowledge  engineering  today,  die  bottleneck  of  knowledge  acquisition. 
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Figure  1:  Hie  bottleneck  of  knowledge  acquisition  is  transfer  of  expertise.  This  comprises  (i)  the 
expert's  difficulty  in  articulating  what  he  knows,  and  (it)  the  impedance  mismatch  between  the 
concepts  and  vocabulary  of  die  expert  and  the  knowledge  engineer. 


Two  possible  solutions  to  diis  problem  suggest  themselves  (though  they  are  not  mutually  exclusive.) 
First,  one  might  try  somehow  to  widen  die  channel  joining  expert  to  program,  for  example  by 
building  a  sophisticated  natural  language  interface. 

The  difficulty  with  this  is  that  the  expert  must  communicate  not  merely  the  "facts”  of  his  field,  but 
also  the  heuristics:  the  informal  judgmental  rules  which  guide  him.  These  arc  rarely  thought  about 
concretely,  and  almost  never  appear  in  journal  articles,  textbooks,  or  university  courses.  Thus,  even 
with  a  wider  channel,  the  expert  would  have  difficulty  in  verbalizing  Ids  heuristics. 


t 
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Hie  second  possible  solution  is  to  sever  the  umbilicus  entirely:  eliminate  the  knowledge  engineer 
and  the  human  expert,  expose  the  program  to  die  environment,  and  let  it  discover  new  knowledge 
on  its  own.  Can  this  be  done?  Since  knowledge  comprises  both  facts  and  heuristics,  the  question 
divides  into  two  parts:  can  new  domain  concepts  and  relationships  be  discovered,  and  can  new 
domain  heuristics  be  discovered?  'fills  paper  is  addressed  to  these  questions,  and  it  presents 
evidence  that  the  answers  arc  affirmative. 

Along  the  way.  an  elementary  "theory  of  heuristics"  accrues.  Our  initial  definition  of  a  heuristic  is: 
a  piece  of  knowledge  capable  of  suggesting  plausible  actions  to  follow  or  implausible  ones  to  avoid. 
In  Section  3,  it  becomes  apparent  that  this  is  insufficient:  for  a  body  of  heuristics  to  be  effective 
(useful  for  guiding  rather  than  merely  for  rationalizing  in  hindsight)  each  heuristic  must  specify  a 
situation  or  context  in  which  its  actions  arc  especially  appropriate  or  inappropriate.  The  theory 
developed  in  Section  5  is  based  on  this  definition. 


2.  OVERVIEW 


2.1.  The  Central  Line  of  the  Argument 

1.  Mew  domains  of  knowledge  8  can  he  developed  by  using  heuristics.  Radically  new  concepts  and 
relations  connecting  them  can  be  discovered  by  employing  a  large  corpus  of  heuristics  both  to 
suggest  plausible  actions  and  to  prune  implausible  ones.  To  accomplish  this  requires  heuristics  of 
varying  levels  of  generality  and  power,  an  adequate  representation  for  knowledge,  some  initial 
hypotheses  about  the  nature  of  domain  8,  and  the  ability  to  gather  data  and  test  conjectures  about 
that  domain. 

2.  As  new  domains  of  knowledge  emerge  and  evolve,  new  heuristics  are  needed.  A  field  may  change 
by  the  introduction  of  some  new  device,  theory,  technique,  paradigm,  or  observable  phenomenon: 
each  time  it  docs  so,  the  corpus  of  heuristics  useful  for  dealing  with  that  field  may  also  change. 
Consider  the  body  of  heuristics  useful  in  planning  a  trip  from  San  Francisco  to  Hern.  Over  the  last 
century,  many  new  ones  have  been  added,  and  many  old  ones  have  undergone  revision. 

3.  New  heuristics  can  be  developed  by  using  heuristics.  The  first  tv  o  points  imply  that  new 
heuristics  must  be  discovered.  How  is  this  done?  Since  "Heuristics"  is  a  domain  of  knowledge,  like 
Electronics,  or  Mathematics,  or  Travel  planning,  perhaps  all  that  is  necessary  is  to  set  5  =  Heuristics 
in  (1)  That  is.  let  the  field  of  heuristics  itself  grow  via  heuristic  guidance.  To  do  this  would  require 
many  types  of  heuristics  (some  quite  general,  some  specific  to  dealing  with  other  heuristics,  etc.),  an 
adequate  representation  for  heuristics,  and  some  hypotheses  about  the  nature  of  heuristics. 

4.  As  new  domains  of  knowledge  emerge  and  evolve,  new  representations  arc  needed.  Just  as  the 
potency  of  a  fixed  body  of  heuristics  decreases  as  we  move  into  new  fields,  so  too  docs  the  potency 
of  whatever  scheme  is  being  used  to  represent  knowledge.  Representations  must  evolve  as  domain 
knowledge  accretes. 

5.  Mew  representations  can  be  developed  by  using  heuristics.  Points  (1)  and  (4)  imply  that  new 
representations  for  knowledge  must  he  devised  from  time  to  time,  and  that  existing  schemes  must 
change.  How  can  this  happen?  Since  "Representation  of  knowledge"  is  a  field,  just  as  is 
Mathematics,  or  Klectronics.  or  Heuristics,  or  Travel  planning,  perhaps  we  can  somehow  set 

.8- Representation  in  (1).  'ITial  is,  allow  heuristics  to  manage  the  development  of  new 
representations. 

The  final  point  is  that  there  is  no  sixth  point  to  make.  The  preceding  five  statements  comprise  a 
research  programme  to  follow,  one  plan  of  attack  upon  the  central  problem,  the  bottleneck  of 


automatic  knowledge  acquisition. 


Other  directions  of  attack  arc  promising,  and  are  being  pursued  vigorously  by  several  AI 
researchers.  For  most  fields,  some  necessary  component  required  by  (1)  above  is  missing  (c.g.,  the 
automatic  acquisition  of  data  is  awkward  or  impossible).  In  such  eases,  the  human  expert  must  be 
preserved  "in  the  loop"  of  Figure  1.  Any  aids  for  interviewing  the  expert  arc  then  quite  important, 
tools  which  facilitate  the  manual  knowledge  acquisition  process  depicted  in  Figure  1.  Indeed, 
much  recent  AI  activity  focuses  on  developing  such  tools:  Age,  Emycin.  Expert.  HearsayIII,  Rll, 
Roget.  Rosie,  and  die  various  knowledge  representation  languages. 

This  paper  presents  work  to  date,  by  the  author,  along  the  research  programme  outlined  in  Figu.c 
2.  Although  the  development  parallels  the  ordering  given  therein,  the  amount  of  space  devoted  to 
each  point  is  not  uniform.  Much  of  the  paper  is  concerned  with  recounting  the  experience  of 
building  AM.  a  computer  program  which  searches  for  interesting  new  concepts  and  conjectures  in 
elementary  mathematics  (point  (1);  sec  Figure  2  below).  The  analysis  of  AM’s  eventual  demise 
provides  an  illustration  of  (2).  Much  of  the  remainder  is  used  to  develop  the  rudiments  of  a  theory 
of  heuristics,  which  theory  is  required  for  (3).  The  paper  closes  with  a  detailed  example  illustrating 

(3),  (4),  and  (5). 


(1)  New  domains  of  knowledge  5  can  be  developed  by  using  heuristics. 

(2)  As  new  domains  of  knowledge  emerge  and  evolve,  new  heuristics  are  needed. 

(3)  New  heuristics  can  be  developed  by  using  heuristics. 

(4)  As  new  domains  of  knov/ledge  emerge  and  evolve,  new  representations  are  needed. 

(5)  New  representations  can  be  developed  by  using  heuristics. 

Figure  2:  Automatic  knowledge  acquisition  via  discovery 


2.2.  Controlling  the  Use  of  Heuristic  Knowledge 

There  is  an  implied  "control  structure”  for  the  processes  of  using  and  acquiring  knowledge  (solving 
and  proposing  problems,  using  and  discovering  heuristics,  choosing  and  changing  representations, 
etc.)  In  fact,  it's  a  nontrivial  assumption  that  a  single  control  loop  is  powerful  enough  to  manage 
both  types  of  processes.  Our  experiences  with  expert  systems  in  the  past  [Feigenbaum  77]  have 
taught  us  that  the  power  lies  in  the  knowledge,  not  in  the  inference  engine. 

What  is  that  topmost  control  loop?  It  assumes  that  there  is  a  large  corpus  of  heuristics  for  choosing 
(and  shifting  between)  representations.  From  time  to  time,  some  of  these  heuristics  evaluate  how 
well  the  current  representations  arc  performing  (c.g..  is  there  now  some  operation  which  is 
performed  very  frequently,  but  which  is  notoriously  slow  in  the  current  representation?)  At  any 
moment,  if  the  representations  used  seem  to  be  performing  sub-optimnlly.  some  attention  will  be 
focused  on  the  problem  of  shifting  to  other  ones,  maintaining  the  same  knowledge  simultaneously 
in  multiple  representations,  devising  whole  new  systems  of  representation,  etc.  Similarly,  we  assume 
there  arc  several  heuristics  which  monitor  the  adequacy  of  the  existing  stock  of  heuristics,  and  as 
need  arises  formulate  (and  eventually  work  on  and  solve)  tasks  of  the  form  "Diagonalization  is  used 
heavily,  but  has  no  heuristics  associated  with  it:  try  to  find  some  new  specific  heuristics  for  dealing 
with  Diagonalization".  A  typical  rule  for  working  on  such  a  task  might  say  "To  find  heuristics 
specific  to  C,  try  to  analogize  heuristics  specific  to  concepts  which  were  discovered  the  same  way 
that  C  was  discovered". 

It  is  assumed  that  these  representation  heuristics  and  heuristic  heuristics  have  run  for  a  while,  and 
the  system  is  in  a  kind  of  equilibrium.  The  representations  employed  arc  well  suited  to  the  tasks 
being  performed,  and  the  heuristics  being  followed  serve  as  quite  effective  guides  for  "plausible 


move  generation"  and  "implausible  move  elimination."  The '  system  now  proceeds  for  a  while 
aiong  its  object-level  pursuits  ,  whatever  they  may  be  (proving  theorems  in  plane  geometry, 
discovering  new  concepts  in  programming,  etc.)  Gradually,  the  object  level  may  evolve:  new 
concepts  will  be  uncovered  and  focused  upon,  new  laboratory  techniques  will  be  discovered,  long¬ 
standing  open  questions  will  be  answered,  etc.  As  this  occurs,  the  old  representations  for 
knowledge,  and  the  old  set  of  guiding  heuristics,  may  become  less  ideal,  less  effective.  This  in  turn 
would  be  detected  by  some  of  the  "meta"-hcuristics  discussed  in  the  last  paragraph,  and  they  would 
cause  the  system  to  recover  its  equilibrium,  to  search  for  new  representations  and  new  heuristics  to 
deal  effectively  once  again  with  the  objects  and  operations  at  the  object  level. 

In  other  words,  new  concepts,  conjectures,  theorems,  etc.  emerge  all  the  time;  as  they  are 
investigated,  some  turn  out  to  be  useful  and  some  turn  out  to  be  dead-ends;  using  a  fixed  set  of 
guiding  heuristics,  the  rate  at  which  useful  new  discoveries  arc  made  will  decline  gradually  over 
time;  eventually  it's  worth  pausing  in  the  search  for  domain-specific  knowledge,  and  turning 
instead  to  the  problem  of  finding  new  heuristics  (perhaps  by  articulating  experiences  to  date  in  the 
task  domain).  The  discoverer  later  returns  to  his  original  task,  armed  with  new  and  hopefully 
more  powerful  heuristics.  This  cycle  of  looking  for  domain  concepts,  occasionally  punctuated  by  an 
effort  to  find  new  heuristics,  continues  until,  gradually,  it  becomes  harder  and  harder  to  find  new 
heuristics.  At  that  point  it  becomes  worthwhile  to  look  for  new  and  different  representations  for 
knowledge. 

The  top-level  control  structure  is  thus  homeostatic:  detecting  and  correcting  for  any 

inappropriateness  of  representations  employed  or  heuristics  employed.  For  these  purposes,  we 
believe  it  suffices  to  have  (and  use)  a  corpus  of  heuristics  for  guidance.  Of  course  that  too  level 
loop  could  itself  be  implicitly  defined  by  a  set  of  heuristic  rules,  and  we  would  expect  such  lulcs  to 
change  from  time  to  time,  albeit  very  slowly.  If,  for  cxampic,  no  new  concepts  or  operations  were 
defined  at  the  object  level  for  a  long  period  of  time,  then  the  need  for  dose  monitoring  of  the 
adequacy  of  the  representations  being  employed  would  evaporate.  One  important  point  is  that  it  is 
not  necessary  to  distinguish  meta-heuristics  from  object-level  heuristics:  they  can  be  represented  the 
same  way,  they  can  be  managed  by  the  same  interpreter,  etc.  For  example,  the  very  general 
recursive  aile  "To  specialize  a  complex  construct,  find  the  component  using  the  most  resources,  and 
replace  it  by  several  alternate  specializations"  applies  to  specializing  laboratory  procedures, 
mathematical  functions,  heuristics  (including  itself!),  and  representational  schemes. 


3.  Heuristics  used  to  develop  new  knowledge 


"How  was  X  discovered?"  When  confronted  with  such  a  question,  the  philosopher  or  scientist  will 
often  retreat  behind  the  mystique  of  the  all-seeing  I's:  Illumination,  Intuition,  and  Incubation.  A 
different  approach  would  be  to  provide  a  rationalization,  a  scenario  in  which  a  researcher  proceeds 
reasonably  from  one  step  to  the  next,  and  ultimately  synthesizes  the  discovery  X.  In  order  for  the 
scenario  to  be  convincing,  each  step  the  researcher  takes  must  be  justified  as  a  plausible  one.  Such 
justifications  arc  provided  by  citing  heuristics ,  more  or  less  general  rules  of  thumb,  judgmental 
guides  to  what  is  and  is  not  an  appropriate  action  in  some  situation. 

For  cxampic,  consider  the  heuristic  in  figure  3.  It  says  that  if  a  function  f  takes  a  pair  of  A’s  as 
arguments,  then  it's  often  worth  the  time  and  energy  to  define  g(x)=fix,x).  that  is.  to  sec  what 
happens  when  fs  arguments  coincide.  If  f  is  multiplication,  this  new  function  turns  out  to  be 
squaring;  if  f  is  addition,  g  is  doubling.  If  f  is  union  or  intersection,  g  is  the  identity  function;  if  f 
is  subraction  or  cxclusive-or,  g  is  identically  zero.  Thus  we  sec  how  two  useful  concepts  (squaring, 
doubling)  and  four  important  conjectures  might  be  discovered  by  a  researcher  employing  this  simple 
heuristic. 


IF  f:AxA->  B', 

THEN  define  g:A-->  B  as  g(x)  =  f(x,x) 

Figure  3.  A  heuristic  which  leads  to  useful  concepts  and  conjectures 


Elsewhere  [Lenat  79],  we  describe  the  uses  for  a  heuristic  which  says  "If  f:A—>D,  and  there  is  some 
extremal  subset  b  of  R,  Then  define  and  study  f  *y (b)."  If  f  is  Intersection ,  this  heuristic  says  it's 
worth  considering  pairs  of  sets  which  map  into  extremal  kinds  of  sets.  Well,  what’s  an  extremal 
kind  of  set?  Perhaps  we  already  know  about  extremely  small  sets,  such  as  the  empty  set.  Then  the 
heuristic  would  cause  us  to  define  the  relationship  of  two  sets  having  empty  intersection  -  i.e., 
disjointness.  If  f  is  Employed-as,  then  the  above  heuristic  says  it’s  worth  defining,  naming,  and 
study  ing  the  group  of  people  with  no  jobs  (zero  is  an  extremely  small  number  of  jobs  to  hold),  the 
group  of  people  who  hold  down  more  than  one  job  (two  is  an  extremely  large  number  of  jobs  to 
hold).  If  f  is  Divisors-of  then  the  heuristic  would  suggest  defining  the  set  of  numbers  with  no 
divisors,  the  set  of  numbers  with  one  divisor,  with  two  divisors,  and  with  three  divisors.  The  third 
of  these  four  sets  is  the  concept  of  prime  numbers.  Other  heuristics  cause  us  to  gather  data,  to  do 
that  by  dumping  each  number  from  1  to  1000  into  the  appropriate  sct(s),  to  reject  the  first  two  sets 
as  too  small,  to  notice  that  every  number  in  the  fourth  set  is  a  perfect  square,  to  take  their  square 
roots,  and  finally  to  notice  that  they  then  coincide  precisely  with  the  third  set  of  numbers.  Now 
that  we  have  the  definition  of  primes,  and  we  have  found  a  surprising  conjecture  involving  them,  we 
shall  say  that  we  have  discovered  them  (note  that  we  arc  nowhere  near  a  proof  of  that  conjecture). 

Of  course  the  above  instances  of  discoveries  arc  really  just  reductions.  We  can  be  said  to  have 
reduced  the  problem  "How  might  Squaring  be  discovered?”  to  the  somewhat  simpler  problem 
"How  might  Multiplication  be  discovered?"  by  citing  the  heuristic  in  Figure  3.  Similarly,  we 
reduced  the  problem  of  discovering  Primes  to  die  problem  of  discovering  Divisors-of.  Such 
reductions  could  be  continued,  reducing  the  discovery  of  Divisors-of  to  that  of  Multiplication, 
thence  to  Addition  or  Cartesian-product,  and  so  forth.  Eventual!, ,  we  would  go  down  all  die  way 
to  our  conceptual  primitives,  to  concepts  so  basic  that  we  feel  it  makes  no  sense  to  speak  of 
discovering  diem.  Sec  figure  4. 
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Figure  4.  Reducing  each  concept’s  discovery  to  that  of  a  simpler  one.  Note  that  multiplication  can 
be  discovered  if  the  researcher  knows  either  addition  of  numbers  or  Cartesian  products  of  sets. 


Why,  then,  is  the  act  of  creation  so  cherished?  If  some  significant  discoveries  arc  merely  one  or  two 
"heuristic  applications"  away  from  known  concepts,  why  arc  even  one-step  discoveries  worth 
communicating  and  getting  excited  about?  The  answer  is  that  the  discoverer  is  moving  upwards  in 
the  tree,  not  downwards.  He  is  not  rationalizing,  in  hindsight,  how  a  given  discovery  might  have 
been  made;  rather,  he  is  groping  outward  into  the  unknown  for  some  new  concept  which  seems  to 
be  useful  or  interesting.  The  downward,  analytic  search  is  much  more  constrained  than  the  upward, 
synthetic  one.  Discoverers  move  upwards:  axiomatizers,  colonizers,  and  pedagogues  move 
downwards.  Sec  Figure  5.  Fvcn  in  this  limited  situation,  the  researcher  might  apply  the  "Repeat" 
heuristic  to  multiplication,  and  go  off  along  the  vector  containing  exponentiation,  hyper- 
exponentiation,  etc.  Or  he  might  apply  "look  at  inverse  of  extrema"  to  Divisors-of  in  several  ways, 
for  example  looking  at  numbers  with  very  many  divisors. 

Once  a  discovery  has  been  made,  it  is  much  easier  to  rationalize  it  in  hindsight,  to  find  some  path 
downward  from  it  to  known  concepts,  than  it  was  to  make  that  discovery  initially.  That  is  the 
explanation  of  the  phenomenon  we've  all  experienced  after  working  for  a  long  time  on  a  problem, 
the  feeling  "  Why  didn't  !  solve  that  corner!"  When  the  reporter  is  other  than  ourselves,  the  feeling 
is  more  like  "l  could  have  done  that,  that  wasn’t  so  difficult!"  It  is  the  phenomenon  of  wondering 
how  a  magic  trick  ever  fooled  us,  once  we’ve  seen  the  method.  It  enables  us  to  follow  mathematical 
proofs  with  a  false  sense  of  confidence,  being  quite  unable  to  prove  similar  theorems.  It  is  the 
reason  why  we  can  use  Polya's  heuristics  [Polya  45]  to  parse  a  discovery,  to  explain  a  plausible  route 
to  it,  yet  feel  very  little  guidance  from  them  when  faced  with  a  problem  and  a  blank  piece  of  paper. 

There  still  is  that  profusion  of  upward  arrows  to  contend  with.  One  of  the  triumphs  of  Al  has  been 
finding  the  means  to  muffle  a  combinatorial  explosion  of  arrows:  one  must  add  some  heuristic 
guidance  criteria.  That  is,  add  some  additional  knowledge  to  indicate  which  directions  arc  expected 
to  be  the  most  promising  ones  to  follow,  in  any  situation.  So  by  a  heuristic,  from  now  on,  we  shall 
mean  a  contingent  piece  of  knowledge,  such  as  the  top  entry  in  Figure  6,  rather  than  an 
unconstrained  Polya-csque  maxim  (6b).  The  former  is  a  heuristic,  the  latter  is  an  explosive. 
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Figure  5.  The  more  explosive  upward  search  for  new  concepts 


(a)  I F  the  range  of  one  operation  has  a  large  Intersection  with  the  domain  of  a  second, 

and  they  both  have  high  worth, 

and  either  there  is  a  conjecture  connecting  them  or 
the  range  of  the  second  operation  has  a  large 
intersection  with  the  domain  of  the  first, 

TH  EN  compose  them  and  study  the  result. 

( b )  Compose  two  operations  and  study  the  result. 

Figure  6.  A  contingent  heuristic  rule  and  an  explosive  one. 


There  is.  a  partial  theory  of  intelligence  here,  which  claims  that  discovery  can  be  adequately  guided 
by  a  large  collection  of  such  heuristic  rules.  In  particular,  mathematical  discovery  may  be  so 
guided.  To  test  this  hypothesis,  wc  designed  and  constructed  AM,  a  LISP  program  whose  task  was 
to  explore  elementary  finite  set  theory:  gathering  empirical  data,  noticing  regularities  in  them,  and 
defining  new  concepts.  AM  is  well  described  elsewhere  [Lenat  79],  and  a  very  brief  recapitulation 
here  should  suffice. 

AM  began  with  one  hundred  set  theory  concepts.  This  included  static  structures  (sets,  bags,  lists) 
and  many  active  operations  (union,  composition,  canonize).  For  each  concept,  wc  supplied  very  little 
information  besides  its  definition.  In  addition,  AM  contained  243  heuristic  ailcs  for  proposing 
plausible  new  concepts,  for  filling  in  data  about  concepts,  and  for  evaluating  concepts  for 
"intcrcstingness".  Among  them  arc  the  two  heuristics  we  saw  earlier,  for  looking  at  the  inverse  of 
extrema  and  for  looking  at  the  now  function  g(x)  =df  fl[x,x). 

During  the  course  of  its  longest  run  (a  couple  hours),  AM  defined  several  hundred  concepts,  about 
half  of  which  were  reasonable,  and  noticed  hundreds  of  simple  relationships  involving  them,  most 
of  which  were  trivial.  AM  found  several  set-theoretic  concepts  (disjointness,  dc  Morgan’s  laws), 
defined  natural  numbers,  found  arithmetic  and  elementary  divisibility  theory,  and  began  to  bog 
down  in  advanced  number  theory  (after  finding  the  fundamental  theorem  of  arithmetic  and 
Goldbach's  conjecture).  Each  "discovery"  involved  relying  on  over  30  heuristics,  and  almost  all 
heuristics  participated  in  dozens  of  different  discoveries;  thus  the  set  of  heuristics  is  not  merely 
"unwound"  to  produce  the  discoveries.  Since  the  heuristics  did  lead  to  the  discoveries,  they  must  in 
some  sense  be  an  encoding  for  them,  but  they  arc  not  a  conscious  or  (even  in  hindsight)  obvious 
encoding. 

AM’s  basic  control  structure  was  simple:  select  some  slot  of  some  concept,  and  work  to  fill  in 
entries  for  it.  Since  AM  began  with  over  100  concepts,  anti  each  had  about  20  slots  to  fill  in 
(Examples,  Generalizations.  Conjectures,  Analogies,  etc.),  there  were  2000  small  tasks  for  AM  to 
perform,  initially.  This  number  grew  with  time,  because  new  concepts  would  usually  be  defined 
long  before  20  slots  were  filled  in  on  old  ones.  Each  task  was  placed  on  an  agenda,  with  symbolic 
reasons  justifying  why  it  should  be  attended  to.  Those  tasks  having  several  good  reasons  would 
eventually  percolate  to  the  top  of  the  agenda. and  be  worked  on.  To  accomplish  the  selected  task, 
AM  located  relevant  heuristics  and  obeyed  them.  They  in  turn  caused  entries  to  be  filled  in  on 
hitherto  blank  slots,  defined  entirely  new  concepts,  and  proposed  new  tasks  to  be  added  to  the 
agenda. 

There  is  one  more  issue  about  AM  that  should  be  discussed  in  this  paper:  how  it  was  able  to 
efficiently  restrict  its  attention  to  a  small  set  of  potentially  relevant  heuristics  at  all  times.  Consider 
for  a  moment  the  AM  heuristic  dial  says  "IF  a  composition  fog  preserves  most  of  the  properties  that 
f  had,  Tlll-N  it’s  more  interesting."  That’s  useful  when  evaluating  the  worth  of  a  composition,  but 
of  course  is  of  no  help  when  trying  to  find  examples  of  Sets.  Wc  associated  that  heuristic  with  the 


Composition  concept,  the  most  general  concept  for  which  it  was  relevant.  Another  heuristic  AM  has 
says  "IF  the  domain  and  range  of  an  operation  coincide.  THKN  it’s  more  interesting.”  That  one 
was  tacked  onto  the  Operation  concept.  But  note  that  since  Compositions  arc  special  kinds  of 
Operations,  the  heuristic  should  apply  to  them  as  well.  The  general  principle  at  work  here  is  the 
following:  If  a  heuristic  is  relevant  to  C,  then  it's  also  relevant  to  all  specializations  of  C.  If  we 
look  at  the  AM  representation  for  Composition,  we  would  see  a  frame-like  data  structure  (schema, 
property  list)  one  of  whose  slots  is  Generalizations,  and  one  of  the  entries  therein  is  Operation.  This 
is  AM's  way  of  recording  the  fact  that  Composition  is  a  specialization  of  Operation.  The  obvious 
algorithm,  then,  when  dealing  with  some  specific  concept  C,  is  to  follow  Generalization  links 
upward,  gathering  heuristics  tacked  onto  any  concept  encountered  along  the  way.  See  Figure  7.  In 
general,  this  means  that  AM’s  attention  is  restricted  to  log(n)  heuristics,  rather  than  n.  AM  can 
completely  ignore  all  the  rest,  and  need  only  evaluate  the  IF  parts  of  these  log(n)  potentially 
relevant  ones.  In  other  words,  the  Gcncralization/S penalization  hierarchy  of  concepts  has  induced  a 
similar  powerful  structuring  upon  the  set  of  heuristics.  The  power  of  this  technique  is  dimmed 
somewhat  by  the  unequal  distribution  of  heuristics  in  the  Gcncralization/Spccialization  tree:  a  large 
number  of  heuristics  clustered  near  the  few  topmost  (very  general)  concepts. 

As  AM  forayed  into  number  theory,  it  had  only  heuristics  from  set  theory  to  guide  it.  For  instance, 
when  dealing  with  prime  pairs  (twin  primes),  there  were  no  specific  heuristics  relevant  to  them;  they 
were  defined  in  terms  of  primes,  which  were  defined  in  terms  of  divisors-of.  which  was  defined  in 
terms  of  multiplication,  which  was  defined  in  terms  of  addition,  which  was  defined  in  terms  of  set- 
union.  which  (finally!)  had  a  few  attached  heuristics.  Because  it  lacked  number-theory  heuristics, 
embodying  what  we  would  call  common-sense  about  arithmetic.  AM’s  fraction  of  useless  definitions 
went  way  up  (Numbers  which  are  both  odd  and  even;  Prime  triples;  The  conjecture  that  there  is 
only  one  prime  triple  (3.5.7)  but  without  understanding  why;  etc.)  It  was  unexpected  and  gratifying 
that  AM  should  discover  numbers  and  arithmetic  at  all,  but  it  was  disappointing  to  see  the  program 
begin  to  thrash.  When  a  few  dozen  concepts  from  plane  geometry  were  added  to  AM.  the  same 
type  of  thrashing  soon  occurred;  only  the  addition  of  specific  geometry  heuristics  would  prevent  this 
collapse. 
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Figure  7.  One  branch  of  the  Generalization  hierarchy  of  concepts,  with  a  few  of  the  attached  heuristics 


There  are  two  relevant  conclusions  from  the  AM  research:  (i)  It  is  possible  for  a  body  of  heuristics 
to  effectively  guide  a  program  in  searching  for  new  concepts  and  conjectures  involving  them,  (ii)  As 
new  domains  of  knowledge  emerge,  the  old  corpus  of  heuristics  may  not  be  adequate  to  serve  as  a 
guide  in  those  new  domains:  rather,  new  specific  heuristics  arc  necessary.  Notice  that  these  are  also 
die  first  two  points  in  die  argument  of  this  paper  (see  Figure  2). 

Before  embarking  on  point  (3)  of  the  central  argument  of  Figure  2,  it  is  necessary  to  have  a  "theory 
of  heuristics".  Toward  that  end,  we  can  begin  collecting  elements  of  that  theory  based  on  our 
experiences  with  AM.  See  Figure  8.  One  remark,  besides  the  two  mentioned  in  die  last  paragraph, 
is  that  heuristics  can  be  used  both  to  generate  promising  actions  and  to  prune  away  poor  ones. 
Thus.  AM's  search  space  is  never  explicitly  described;  there  is  no  clear  notion  of  a  set  of  legal 
operators  which  defines  some  immense  space  of  syntactic  mathematical  concepts  and  conjectures, 
etc.  Any  such  attempt  would  probably  produce  a  search  space  of  such  size  as  to  be  useless  (10020 
in  AM's  domain  of  elementary  finite  set  theory).  Rather,  AM’s  set  of  heuristics  implicitly  defines  its 
search  space.  If  you  remove  a  heuristic  from  AM,  it  has  less  to  do;  this  is  exactly  the  opposite  of 
the  ease  with  most  heuristic  search  programs,  where  heuristics  arc  used  exclusively  to  prune  away 
implausible  paths. 

The  final  remark  noted  in  figure  8  is  that  the  heuristics  fall  into  a  nice  hierarchy,  induced  by  die 
one  between  domain  concepts.  The  key  point  here  is  that  each  heuristic  has  a  domain  of  relevance: 
die  most  general  concept  to  which  it’s  relevant  and  all  the  specializations  of  that  concept  This 
organization  enables  the  interpreter,  through  simple  inheritance,  to  focus  on  the  log  of  die  number 
of  all  heuristics  in  the  system,  rather  than  that  entire  set,  at  each  moment. 


(0  A  SET  OF  HEURISTICS  CAN  GUIDE  CONCEPT  DISCOVERY 

(ii)  A  NEW  FIELD  WILL  DEVELOP  SLOWLY  IF  NO  SPECIFIC  NEW 

HEURISTICS  FOR  IT  ARE  CON  CO  M  ITT  ANTL  Y  DEVELOPED 

(in)  HEURISTICS  CAN  BE  USED  AS  PLAUSIBLE  MOVE  GENERATORS 

OR  AS  IMPLAUSIBLE  MOVE  ELIMINATORS 

(iv)  THE  GENERALIZATION/SPECIALIZATION  HIERARCHY  OF  CONCEPTS 
INDUCES  A  SIMILAR  STRUCTURE  UPON  THE  SET  OF  HEURISTICS 


Figure  8.  Elements  of  a  theory  of  heuristics,  learned  from  work  on  AM 


4.  Heuristics  change  as  task  domains  do 


Let’s  continue  to  explore  die  notion  of  a  heuristic  having  a  domain  of  relevance.  Consider  the 
following  very  special  situation:  you  arc  asked  to  guess  whether  a  conjecture  is  true  or  false.  What 
heuristics  are  useful  in  guiding  you  to  a  decision  rapidly?  If  the  conjecture  is  in  the  field  of  plane 
geometry,  one  very  powerful  technique  is  to  draw  a  diagram  and  sec  whcdicr  it  holds  in  that 
analogic  model.  But  if  the  conjecture  is  in  the  field  of  point-set  topology,  or  real  analysis,  this  is  a 
terrible  heuristic  which  will  ofien  lead  you  into  error.  For  instance,  if  the  conjecture  mentions  a 


function,  then  any  diagram  you  draw  will  probably  picture  a  function  which  is  everywhere  infinitely 
differentiable,  even  if  such  is  never  stated  in  the  conjccturcs’s  premises.  As  a  result,  many 
properties  will  hold  in  your  diagram  that  can  never  be  proven  from  the  conjecture's  premises.  The 
appropriate  technique  in  topology  or  analysis  is  to  pull  out  your  book  of  101  favorite 
counterexamples,  and  see  whether  tiny  of  them  violate  the  conjecture.  If  it  passes  all  of  them,  then 
you  may  guess  it's  probably  true. 

This  example  dramatizes  the  idea  that  die  power  or  utility  of  a  heuristic  changes  from  domain  to 
domain.  Thus,  as  we  move  from  one  domain  to  another,  the  set  of  heuristics  which  we  should  use 
for  guidance  changes.  Many  of  them  have  higher  or  lower  utility,  some  entirely  new  heuristics  may 
exist,  and  some  of  the  old  ones  may  be  actually  detrimental  if  followed  in  the  new  domain.  For 
instance,  the  "IF  falling  object  THEN  catch  it"  rule  is  useful  for  most  situations,  but  people  are 
burned  when  they  try  to  catch  falling  clothes  irons  and  soldering  irons. 

Heuristics  are  compiled  hindsight:  they  arc  nuggets  of  wisdom  which,  if  only  we'd  had  them  sooner, 
would  have  led  us  to  our  present  state  much  faster.  Even  the  synthesis  of  a  new  discovery  via 
analogy,  aesthetic  criteria  such  as  symmetry,  or  random  combination,  can  be  considered  to  be  the 
result  of  employing  guidance  heuristics  (e.g.,  "Analogies  arc  useful  in  formulating  biological  and 
sociological  theories",  "Symmetry  is  useful  in  postulating  the  existence  of  fundamental  particles  in 
physics”.  "Randomly  looking  for  regularities  in  elementary  number  dicory  and  plane  geometry  may 
be  profitable".  )  Those  guidance  heuristics  were  in  turn  based  on  several  past  episodes,  hence  are 
themselves  compiled  hindsight.  Nilsson  and  others  have  argued  for  the  primacy  of  search;  we  are 
simply  stating  the  corollory  for  the  very  special  ease  where  one  must  let  time  flow  event  nodes  past 
us  for  our  observation  and  recording:  the  primacy  of  compiled  experiential  knowledge. 

As  new  empirical  evidence  accumulates,  it  may  be  useful  to  recompile  the  heuristics.  Certainly  by 
the  time  you’ve  opened  up  a  whole  new  field,  you  must  recompile  them.  Working  in  point-set 
topology  with  geometry  heuristics  is  not  very  efficient,  nor  was  AM’s  working  in  number  theory 
using  only  heuristics  from  set  theory.  The  set  of  heuristics  must  evolve  as  well:  some  old  ones  are 
no  longer  useful,  some  must  be  refined  to  suit  the  new  domain,  and  some  entirely  new  heuristics 
may  be  useful.  As  the  task  varies,  or  as  time  varies  and  one  gains  new  experiences,  one’s  set  of 
guiding  heuristics  is  no  longer  optimal.  The  utility  of  a  heuristic  will  vary,  then,  bodi  across  tasks 
and  across  time,  and  this  variance  is  not  necessarily  continuous. 

Exaedy  what  kinds  of  changes  can  occur  in  a  domain  of  knowledge  that  might  require  you  to  alter 
your  set  of  heuristics?  In  other  words,  what  arc  the  sources  of  granularity  in  the  space  of  "fields  of 
knowledge"? 

First,  there  might  be  the  invention  of  a  new  piece  of  apparatus.  This  could  be  theoretical  (such  as 
Godcl’s  theorem)  or  technological  (such  as  the  computer).  Heuristics  spring  into  being:  rules  which 
tell  you  how  to  use  such  a  thing,  when  it’s  relevant,  how  to  fix  one,  what  kind  to  buy,  etc.  In 
addition,  many  of  die  old  heuristics  may  be  less  or  (rarely)  more  useful  than  they  used  to  be. 

Second,  there  might  be  a  new  technique  devised,  one  which  doesn’t  actually  depend  upon  any  new 
apparatus.  Again,  this  can  be  theoretical  (such  as  Bentley’s  widespread  application  of  divide  and 
conquer  in  complexity)  or  practical  (such  as  Maxain  and  Gilbert’s  clever  method  for  sequencing 
1)NA).  New  heuristics  about  reliability,  applicability,  etc.  are  created,  and  old  ones  fade  away. 

Third,  a  new  phenomenon  may  be  observed.  Whenever  a  new  invention  occurs,  there  arc  often  two 
immediate  new  phenomena:  die  sociological  one  of  how  the  invention  is  used,  and  the  "real"  one 
now  observable  using  the  invention. 

Fourth,  and  most  unusually,  there  may  be  a  newly-explicated  or  newly-isolated  concept  or  field,  one 
which  was  always  around  but  never  spoken  about  explicitly.  The  notion  of  paradigms  is  such  a 
concept,  and  the  whole  field  of  heuristics  itself  is  such  a  field.  For  example,  there  exist  heuristics 
Tor  when  to  apply  heuristics,  for  whom  to  invite  to  talk  about  heuristics,  for  how  to  evaluate  a 
heuristic’s  wordi,  etc. 

In  other  words,  "Heuristics"  itself  is  a  field  of  study.  As  an  analogy,  consider  the  field  of 


"Grammars”.  It  may  be  discussed  theoretically,  independent  of  any  particular  language,  yet  to 
develop  that  theory  the  researcher  no  doubt  was  always  grounded  in  a  context  of  some  language  or 
other.  Similarly,  to  develop  a  general  theory  of  heuristics  one  must  constantly  deal  with  heuristics 
for  some  specific  field  or  task.  Eventually  the  theory  of  Grammars  advanced  to  the  stage  of 
formalization  where  it  no  longer  needed  such  grounding,  but  Heuristics  is  far  from  there  yet. 

In  brief,  the  sources  of  granularity  in  the  space  of  "domains  of  knowledge”  arc  precisely  those 
components  which,  if  varied,  lead  to  a  new  domain  of  knowledge.  In  other  words,  they  define  what 
we  mean  by  a  domain  of  knowledge:  a  set  of  phenomena  to  study,  a  body  of  specific  problems 
about  those  phenomena  which  arc  considered  worth  working  on,  and  a  set  of  methods  (both 
theoretical  and  experimental,  mental  and  material)  for  attacking  such  questions.  The  definition 
corresponds  closely  to  what  Thomas  Kuhn  has  called  "paradigms”. 

This  section  has  now  contributed  three  new  elements  to  our  growing  theory  of  heuristics: 


(v)  HEURISTICS  ARE  COMPILED  HINDSIGHT 

( vi)  THE  SPACE  OF  "DOMAINS  OF  KNOWLEDGE"  IS  GRANULAR 

(vii)  "HEURISTICS"  IS  ITSELF  A  SEPARATE  FIELD 

Figure  9.  Three  additional  elements  of  a  theory  of  heuristics 


5.  A  Theory  of  Heuristics 

5.1  Why  Heuristics  Work 

The  seven  items  mentioned  in  Figures  8  and  9  as  "elements  of  a  theory  of  heuristics”  actually 
sound  more  like  2nd-ordcr  correction  terms  to  some  as-yct  unstated  more  fundamental  theory. 
What  is  that  basic  Olh*ordcr  theory?  What  is  the  central  assumption  underlying  heuristics?  It 
appears  to  be  the  following:  "Appropriatcncssfaction, situation)  is  cts."  That  is.  Appropriateness, 
viewed  as  a  function  of  actions  and  of  situations,  is  a  continuous  function  of  both  variables. 

Corollary  1:  For  a  given  action,  its  appropriateness  is  a  continuous  function  of  situation. 
Heuristics  specify  which  actions  are  appropriate  (or  inappropriate)  in  a  given  situation.  One 
corollary  of  the  central  assumption  is  that  if  the  situation  changes  only  slightly,  then  the  judgment 
of  which  actions  arc  appropriate  also  changes  only  slightly.  Thus  compiled  hindsight  is  useful, 
because  even  though  the  world  changes,  what  was  useful  in  situation  X  will  be  useful  again 
sometime  in  situations  similar  to  X.  There  arc  two  special  cases  of  the  Corollary  1  worth 
mentioning:  sec  Figure  10. 


Appropriateness(action,situation)  is  a  continuous  function. 


0th  : 


COR.  1:  If  action  A  is  appropriate  in  situation  S, 

Then  A  is  appropriate  in  most  situations  which  are  very  similar  to  S. 

COR.  la:  Features  of  the  task  environment  (task)  is  continuous. 

COR.  lb:  World  (time)  is  continuous. 


COR  2:  If  action  A  is  appropriate  in  situation  S, 

Then  so  are  most  actions  which  are  very  similar  to  A. 

Figure  10.  The  central  assumption  underlying  heuristics,  and  two  special  cases 


The  first  of  these,  Cor.  la,  says  that  if  the  task  appears  to  be  similar  to  one  you’ve  seen  elsewhere, 
then  many  of  the  features  of  the  task  environment  will  probably  be  very  similar  as  well:  i.e.,  the 
kinds  of  conjectures  which  might  be  found,  the  solvability  and  difficulty  anticipated  with  a  task,  the 
kinds  of  blind  alleys  which  one  might  be  trapped  in,  etc.  may  all  be  the  same  as  they  were  in  that 
earlier  case.  For  instance,  suppose  that  a  certain  theorem,  UFI\  was  useful  in  proving  a  result  in 
number  theory.  Now  another  task  appears,  again  proving  some  number  theory  result.  Because  the 
tasks  arc  similar.  Cor.  la  suggests  that  UFT  be  used  to  try  to  prove  this  new  result.  This  is  the 
basic  justification  for  using  analogy  as  a  reasoning  mechanism.  A  sentiment  similar  to  this  was 
voiced  by  Poincare’  during  the  last  century:  The  whole  idea  of  analogy  is  that  'Effects’,  viewed  as  a 
function  of  situation,  is  a  continuous  function 

The  second  special  case.  Cor.  lb,  says  that  the  world  doesn’t  change  much  over  time,  and  is  the 
foundation  for  the  utility  of  memory.  In  a  world  changing  radically  enough,  rapidly  enough, 
memory  would  be  a  useless  frill;  consider  the  plight  of  an  individual  atom  in  a  gas. 

Corollary  2:  For  a  given  situation,  appropriateness  is  a  continuous  function  of  actions. 
This  means  that  if  a  particular  action  was  very  useful  (or  harmful)  in  some  situation,  it’i  likely  that 
any  very  similar  action  would  have  had  similar  consequences.  Cor.  2  justifies  the  use  of  inexact 
reasoning,  of  allocating  resources  toward  finding  an  approximate  answer,  of  satisficing.  It  is  the 
basis  for  employing  "generalization"  as  a  mechanism  for  coping  with  the  world:  if  the 
appropriateness  function  were  not  (usually)  continuous  as  a  function  of  actions,  then  most 
generalizations  would  be  false.  One  may  restate  this  corollary  as  "World  (situation)  is  continuous." 

If  the.  central  assumption  holds,  then  the  ideal  interpreter  for  heuristics  is  the  one  shown  in  figure 
11.  Note  that  this  is  very  similar  to  a  pure  production  system  interpreter.  In  any  given  situation, 
some  rules  will  be  expected  to  be  relevant  {because  they  were  truly  relevant  in  situations  very 
similar  to  the  present  one).  One  or  more  of  them  arc  chosen  and  applied  (obeyed,  evaluated, 
executed,  fired,  etc.)  This  action  will  change  the  situation,  and  liic  cycle  begins  anew.  Of  course 
one  can  replace  the  "locate  relevant  heuristics"  subtask  by  a  copy  of  this  whole  diagram:  that  is,  it 
can  be  performed  under  the  guidance  of  a  body  of  heuristics  specially  suited  to  the  task  of  finding 
heuristics.  Similarly,  the  task  of  selecting  which  rulc(s)  to  lire,  and  in  what  order,  and  with  how 
much  of  each  resource  available,  can  also  be  implemented  as  an  entire  heuristic  lute  system 
procedure. 


Situation 
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/ 

Changes  to  the  situation 

(hopefully  for  the  better)  t 
(hopefully  quickly) 
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•> 
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\ 
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Locate  relevant  heuristics 

/ 
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\  / 

\  <-  / 

Apply  chosen  heuristic(s) 


Figure  11.  The  O^-ordcr  interpreter  for  a  body  of  heuristic  rules 


By  examining  the  loop  in  Figure  11,  we  can  quickly  "read  off’  the  possible  bugs  in  heuristics,  the 
list  of  ways  in  which  a  heuristic  can  be  "bad”: 

It  might  not  be  intcrprctable  at  all. 

It  might  be  inteiprctable  but  it  might  never  even  be  potentially  relevant. 

It  might  be  potentially  relevant  but  its  IF  part  might  never  be  satisfied. 

It  might  trigger,  but  never  be  the  rule  actually  selected  for  execution  (firing). 

It  might  fire,  but  its  THEN  part  might  not  produce  any  effect  on  the  situation. 

It  might  produce  a  bad  effect  on  the  situation. 

It  might  produce  a  good  effect,  but  take  so  long  that  it’s  not  cost-effective. 

This  is  reminiscent  of  John  Seely  Brown’s  work  on  a  generative  theory  of  bugs  [Brown  &  VanLchn 
80],  and  is  meant  to  be.  Perhaps  by  viewing  heuristics  as  performers,  this  approach  can  lead  to  an 
effective  method  for  diagnosing  buggy  heuristics,  hence  improving  or  eliminating  them. 

There  arc  several  things  wrong  with  the  0th  order  theory:  it  presumes  that  knowledge  is  complete 
and  unchanging:  that  is,  it  ignores  the  "potato  in  the  tailpipe”  problem  and  the  frame  problem. 
The  reader  may  have  noticed  that  the  first  of  the  two  corollaries  in  Figure  10  is  almost  precisely  the 
negation  of  the  empirically-derived  statement  (vi)  in  Figure  9.  The  latter  claims  that  the  space  of 
task  domains  is  inherently  and  profoundly  quantized;  the  corollary  claims  it’s  continuous.  As  we 
said  earlier,  the  items  in  Figures  8  and  9  arc  2nd-order  correction  terms  to  a  theory  of  heuristics, 
and  Figure  10  is  a  very  simplified  Oth-ordcr  theory.  Intermediate  between  them  lies  a  theory  which 
interfaces  to  each. 

That  lst-ordcr  theory  says  that  the  Oth-ordcr  theory  is  often  a  very  useful  fiction.  It  is  cost-effective 
to  behave  as  if  it  were  true,  if  you  arc  in  a  situation  where  your  state  of  knowledge  is  very 
incomplete,  w'hcrc  there  is  nevertheless  a  great  quantity  of  knowledge  already  known,  where  the 
task  is  very  complex,  etc.  At  an  earlier  stage,  there  may  have  been  too  little  known  to  express  very 
many  heuristics:  much  later,  the  environment  may  be  well  enough  understood  to  be  algorithmized; 
iii  between,  heuristic  search  is  a  useful  paradigm.  Predicting  eclipses  has  passed  into  this  final  stage 
of  algorithmization;  medical  diagnosis  is  in  the  middle  stage  where  heuristics  arc  useful;  building 
programs  to  search  for  new  representations  of  knowledge  is  still  pre-heuristic. 


1*  •  IF  you  nrc  In  a  complex,  knowlcd||Carlchc  Incomplctcly-uiulcrstood  world, 
THEN  it  Is  frequently  useful  to  behave  as  though  It  were  true  that 
appropriatencss(action,situation)  is  continuous  in  both  variables. 

Figure  12.  The  first-order  theory  of  heuristics:  the  0*-order  theory  is  a  useful  fiction 


Notice  that  the  lsl-ordcr  theory  is  itself  a  heuristic!  This  is  not  too  disturbing,  since  it  is  dubious 
that  we  will  ever  know  enough  about  thinking  to  supplant  it.  Until  your  model  of  me  is  absolutely 
perfect,  your  predictions  of  my  behavior  will  diverge  more  and  more  as  time  proceeds,  and  after  a 
relatively  short  interval  you  will  have  to  rely  upon  heuristics  again  to  understand  and  predict  my 
thoughts  and  actions.  And  there  is  probably  something  akin  to  Heisenberg’s  uncertainty  principle  to 
guarantee  that  your  model  of  me  can  never  be  perfectly  complete. 

The  second-order  corrections  in  Figures  8  and  9  now  apply  to  the  first-order  theory,  and  in  addition 
some  new  second-order  ones  arc  apparent.  For  instance,  the  adjective  "frequently",  used  in  Figure 
12,  can  be  replaced  by  a  body  of  rules  which  govern  when  it  is  and  is  not  useful  to  behave  so. 


5.2  The  Power  of  Each  Individual  Heuristic 

We  have  discussed  the  nature  of  using  a  corpus  of  heuristics,  but  what  is  the  nature  of  a  single  one? 
We’ve  already  said  that  it  has  some  domain  of  relevance.  What  docs  that  mean?  If  we  graph  the 
utility  or  power  of  the  heuristic,  as  function  of  task  domain,  we  would  expect  a  curve  resembling 
that  of  Figure  13.  Namely,  there  is  some  range  of  tasks  for  which  the  heuristic  has  positive  value. 
Outside  of  this,  it  is  often  counterproductive  to  use  the  heuristic  (although  the  utility  may  drop  to 
zero  rather  than  falling  below  zero  as  pictured).  For  tasks  sufficiently  far  away,  the  utility 
approaches  zero,  because  the  heuristic  is  never  even  considered  potentially  relevant,  hence  never 
fires.  As  one  example,  consider  the  heuristic  that  says  "If  you  want  to  test  a  conjecture.  Then  draw 
a  diagram".  As  we’ve  seen,  this  has  high  utility  within  F.uclidcan  plane  geometry,  but  as  the  axioms 
of  the  theory  are  changed,  its  worth  declines.  By  the  time  you  reach  point-set  topology  or  real 
analysis,  its  value  is  negative.  Eventually,  domains  like  philosophy  arc  reached,  where  drawing 
diagrams  can  rarely  be  done  meaningfully.  (As  Figs.  13-15  indicate,  we  hope  that  "draw  a  diagram" 
is  a  good  heuristic  for  the  domain  of  Heuristics.)  As  another  example,  consider  the  heuristic  "If  a 
predicate  ratcly  returns  True,  Then  define  new  generalizations  of  it".  This  is  useful  in  set  theory, 
worse  than  useless  in  number  theory,  and  useless  in  domains  where  "predicate"  is  undefined. 


Figure  13.  The  graph  of  a  heuristic’s  power,  as  a  function  of  the  task  it  is  applied  to. 
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Figure  14.  The  change  in  power  when  a  heuristic  (*)  has  its  THEN*  part  specialized  (+) 


If  we  specialize  the  THEN-part  of  a  heuristic,  it  will  typically  have  higher  utility  but  only  be 
relevant  over  a  narrower  domain.  Sec  Figure  14.  Notice  the  area  under  the  curve  appears  to  be 
remain  roughly  constant;  this  is  a  geometric  interrelation  of  the  tradeoff  between  generality  and 
power  of  heuristic  rules.  It  is  also  worth  noticing  that  the  new  specialized  heuristic  may  have 
negative  utility  in  regions  where  the  old  general  one  was  still  positive,  and  it  will  be  meaningless 
over  a  larger  region  as  well.  Consider  for  example  the  case  where  "Generalize  a  predicate"  is 
specialized  into  "Generalize  a  predicate  by  eliminating  one  conjunct  from  its  definition".  The  latter 
is  more  powerful,  but  only  applies  to  predicates  defined  conjunctively. 

By  examining  Figure  14.  it  is  possible  to  generate  a  list  of  possible  bugs  that  may  occur  when  the 
actions  (THEN-part)  of  a  heuristic  arc  specialized.  First,  the  domain  of  the  new  one  may  be  so 
narrow  that  it  is  merely  a  spike,  a  delta  function.  This  is  what  happens  when  a  general  heuristic  is 
replaced  by  a  table  of  specific  values.  Another  bug  is  if  the  domain  is  not  narrowed  at  all;  in  such 
a  case,  one  of  the  heuristics  is  probably  completely  dominated  by  the  other.  A  third  type  of  bug 
appears  when  the  new  heuristic  has  no  greater  power  than  the  old  one  did.  For  example,  "Smack  a 
vu-graph  projector  if  it  makes  noise”  has  much  narrower  domain,  but  no  higher  utility,  than  the 
more  general  heuristic  "Smack  a  device  if  it's  acting  up”.  Thus,  the  area  under  the  curve  is  greatly 
diminished. 

While  the  last  paragraph  warned  of  some  extreme  bad  cases  of  specializing  the  THEN-  part  of  a 
heuristic,  there  arc  some  extreme  good  cases  which  frequently  occur.  The  utility  (power)  axis  may 
have  some  absolute  desirable  point  along  it  (c.g.,  some  guarantee  of  correctness,  or  optimal 
efficiency),  and  by  specializing  the  heuristic  it  may  exceed  that  thrcshhold  (albeit  over  a  narrow 
range  of  tasks).  In  such  a  case,  the  way  we  qualitatively  value  that  heuristic  may  alter,  and  indeed 
we  may  term  it  a  method,  or  an  algorithm.  One  way  to  rephrase  this  is  to  say  that  algorithms  arc 
merely  heuristics  which  arc  so  powerful  that  guarantees  can  be  made  about  their  use.  Conversely, 
one  ca  try  to  apply  an  algorithm  outside  its  region  of  applicability,  in  which  case  the  result  may  be 
useful  aid  that  algorithm  is  then  being  used  as  a  heuristic.  The  latter  is  frequently  done  in 
mathematics  (c.g.,  pretending  one  can  differentiate  power  scries  expansions  to  guess  at  the  value  of 
the  series).  Finally,  note  that  the  specialization  of  the  heuristic  to  one  which  applies  only  on  a  set 
of  measure  zero  is  not  necessarily  a  bad  thing;  tables  of  values  do  have  their  uses. 

Specializing  the  IF-part  of  a  heuristic  rule  results  in  its  having  a  smaller  region  of  non-zero  utility. 
That  is,  it  triggers  less  frequently.  As  figure  15  shows,  this  is  like  placing  a  filter  or  window  along 
the  x-axis,  outside  of  which  the  power  curve  will  be  absolutely  zero.  In  the  best  of  cases,  this 


removes  the  negative-utility  regions  of  die  curve,  and  leaves  the  positive  regions  untouched.  For 
example,  we  might  preface  die  "Draw  a  diagram"  heuristic  with  a  new  premise  clause,  "If  you  arc 
asked  to  test  a  geometry  conjecture".  'litis  will  cause  us  to  use  the  rule  in  Geometry  situations, 
where  it  has  been  found  to  have  a  high  utility. 
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Figure  15.  The  graph  of  a  heuristic's  power,  after  its  IF-  part  has  been  optimally  specialized. 


By  examining  Figure  15.  we  can  generate  a  list  of  possible  bugs  arising  from  specializing  the 
conditions  (IF-part)  of  a  heuristic  rule.  The  new  window  may  be  narrowed  to  a  spike,  thus 
preventing  the  rule  from  almost  ever  firing.  There  may  be  no  narrowing  whatsoever:  in  that  case, 
it  typically  would  add  a  little  to  the  time  required  to  test  the  IF-part  of  the  rule,  while  not  raising 
the  power  at  all.  Of  course  the  most  serious  error  is  if  it  clips  away  some  -  or  all!  -  of  the  positive 
region.  Thus,  we  would  not  want  to  replace  a  general  diagram-drawing  recommendation  with  one 
which  advised  us  to  do  so  only  for  real  analysis  conjectures. 

The  space  of  domains  is  granular,  quantized,  hence  these  power  curves  arc  step- functions  (or 
histograms)  rather  than  smooth  curves  as  we’ve  drawn  them.  One  implication  of  this  is  that  there  is 
a  very  precise  point  along  the  task  axis  where  the  utility  drops  from  positive  to  negative  (or  zero). 
Often,  this  is  a  very  large,  very  sudden  drop  across  a  single  discontinuity  in  the  axis  (c.g.,  when  a 
product  emerges,  an  expert  dies,  a  theorem  is  proved.) 

What  are  implications  of  this  simple  "theoty  of  heuristics"?  One  effect  is  to  determine  in  what 
order  heuristics  should  be  chosen  for  execution:  this  is  discussed  in  the  next  paragraph.  A  second 
effect  is  to  indicate  some  very  useful  slots  that  each  heuristic  can  and  should  have,  attributes  of  a 
heuristic  that  can  be  of  crucial  importance:  die  peak  power  of  the  rule,  its  average  power,  the  sizes 
of  the  positive  and  negative  regions  (both  projections  along  the  task  axis  (x-axis)  and  the  areas 
under  the  curves),  the  steepness  with  which  the  power  curve  approaches  the  x-axis.  etc.  Let  us 
take  the  last  attribute  to  illustrate.  Why  is  it  useful  to  know  how  steeply  the  power  curve 
approaches  Utility  =  0  (the  x-axis)?  If  this  is  very  steep,  then  it  is  worth  investing  a  great  amount  of 
resources  determining  whether  the  rule  is  truly  relevant  in  any  situation  (for  if  it  is  slightly 
irrelevant,  then  it  may  have  a  huge  negative  effect  if  used).  Conversely,  if  die  slope  is  very  gentle, 
then  very  little  harm  will  result  from  slightly-inappropriatc  applications  of  the  rule,  hence  not  much 
time  need  ever  be  spent  worrying  about  whether  or  not  it's  truly  relevant  to  the  situation  at  hand. 

The  whole  process  of  drawing  the  power  curves  for  heuristics  is  still  conjectural.  While  a  few  such 
graphs  have  been  sketched,  there  is  no  algorithm  for  plotting  them,  no  library  of  thousands  of 
catalogued  and  plotted  heuristics,  not  even  any  agreement  on  what  the  various  power  and  task  axes 
should  be.  Nevertheless,  it  has  already  proven  so  be  a  useful  metaphor,  and  has  suggested  some 
important  properties  of  heuristics  which  should  be  estimated  (such  as  the  just-mentioned  downside 
risk  of  applying  a  heuristic  in  a  slightly  inappropriate  situation).  It  is  a  qualitative,  empirical  theory 
(Newell  &  Simon  76|.  and  predicts  the  form  that  a  quantitative  theory  might  assume. 


How  should  heuristics  be  chosen  for  execution?  In  any  given  situation,  we  will  be  at  a  point  along 
the  x-axis,  and  can  draw  a  vertical  line  (in  case  of  multi*dimcnsional  task  axes,  we  can  imagine  a 
hypcrplanc).  Any  heuristics  which  have  positive  power  (utility)  along  that  line  arc  then  useful  ones 
to  apply  (according  to  our  theory  of  heuristics),  and  the  ones  with  high  power  should  be  applied 
before  the  ones  with  low  power.  Of  course,  it  is  unlikely  we  would  know  the  power  of  a  heuristic 
precisely,  in  each  possible  situation;  while  diagrams  such  as  Figs.  13-15  may  be  suggestive,  the  data 
almost  never  is  available  to  draw  them  quantitatively  for  a  given  heuristic.  It  is  more  likely  that  we 
would  have  some  measure  of  the  average  power  of  each  heuristic,  and  would  use  that  as  a  guess  of 
how  useful  each  one  would  be  in  the  current  situation.  Since  there  is  a  tradeoff  between  generality 
and  power,  a  gross  simplification  of  the  preceding  strategy  is  simply  to  apply  the  most  specific 
heuristic  first,  and  so  on.  This  is  the  scheme  AM  used,  with  very  few  serious  problems.  If  all 
heuristics  had  precisely  the  same  multiple  integral  of  their  power  curves,  this  would  coincide  with 
the  previous  scheme.  Of  course,  there  arc  always  some  heuristics  which,  while  being  very  general, 
really  arc  the  most  important  ones  to  listen  to  if  they  ever  trigger  ("If  a  conflagration  breaks  out. 
Then  escape  it"). 

Notice  that  the  "generality  vs.  power”  tradeoff  has  turned  into  a  statement  about  the  conservation 
of  volumes  in  nxm-dimensional  space,  when  one  takes  the  multiple  integral  of  all  the  power  curves 
of  a  heuristic.  In  particular,  there  arc  tradeoffs  among  all  the  dimensions:  a  gain  along  some  utility 
dimension  (say  Convincingness)  can  be  paid  for  by  a  decrease  along  another  (say  Efficiency)  or  by  a 
decrease  along  a  task  dimension  (a  reduction  of  breadth  of  applicability  of  the  heuristic).  One 
historically  common  bug  has  been  over-reliance  upon  (and  glorification  of)  heuristics  which  are 
pathologically  extreme  along  some  dimension;  tables,  algorithms,  weak  methods,  etc. 

Heuristics  are  often  spoken  of  as  if  they  were  incomplete,  uncertain  knowledge,  much  like 
mathematical  conjectures  or  scientific  hypotheses.  This  is  not  necessarily  so.  The  epistemological 
status  of  a  heuristic,  its  justification,  can  be  arbitrarily  sound.  For  example,  by  analyzing  the 
optimal  play  of  Blackjack,  a  rather  complex  table  of  appropriate  actions  (as  a  function  of  situation) 
is  built  up.  One  can  simplify  this  into  a  "Basic  Strategy"  of  just  a  few  rules,  and  know  quite 
precisely  just  how  well  those  rules  should  perform.  That  is.  heuristics  may  be  built  up  from 
systematic,  exhaustive  search,  from  "complete"  hindsight.  Another  example  of  the  formal,  complete 
analysis  of  heuristic  methods  is  well  known  from  physics,  where  Newtonian  mechanics  is  known  to 
be  only  an  approximation  to  the  world  we  inhabit.  Relativistic  theories  quantify  that  deviation 
precisely.  But  rather  than  supplanting  Newtonian  physics,  they  bolster  its  use  in  everyday  situations, 
v/hcrc  its  inadequacies  can  be  quantitatively  shown  to  be  too  small  to  make  worthwhile  the 
additional  computation  required  to  do  relativistic  calculations. 

Many,  nay  most,  heuristics  are  merely  conjectural,  empirical,  aesthetic,  or  in  other  ways 
epistemologically  less  secure  than  the  Basic  Strategy  in  Blackjack  and  Newtonian  physics.  The 
canonical  use  of  heuristics  is  to  pretend  they  arc  true:  the  canonical  use  of  a  conjecture  is  to  guide  a 
search  for  a  proof  of  it.  If  a  conjecture  turns  out  to  be  false,  it  may  yet  stand  as  a  useful  heuristic. 


5.3  The  Space  of  Heuristics 

The  utility  of  an  entire  set  of  heuristics  can  be  graphed  os  a  function  of  the  tasks  it’s  being  applied 
to.  and.  not  surprisingly,  produces  a  curve  similar  to  the  one  in  Figure  13.  Hopefully,  the  set  of 
heuristics  is  more  useful  than  any  member,  thus  it  is  probably  much  broader  and  taller  (or  less 
negative)  than  any  single  heuristic  inside  it.  One  cannot  simply  "add"  the  curves  of  its  members; 
the  interactions  among  heuristics  arc  often  quite  strong,  and  independence  is  the  exception  rather 
than  the  rule.  Often,  two  heuristics  will  be  different  methods  for  getting  to  the  same  place,  or  one 
will  be  a  generalization  or  isomorph  of  the  other,  etc.,  and  as  a  result  the  set  will  really  not  benefit 
very  much  from  having  both  of  them  present.  On  die  other  hand,  sometimes  heuristics  interact 
syncrgislically,  and  the  effects  can  be  much  greater  than  simple  superposition  would  have  predicted. 
The  opposite  of  this  simetimes  happens:  two  experts  have  given  you  heuristics  which  separately 
work,  yet  which  contradict  each  odicr.  Using  either  half-corpus  would  solve  youi  problem,  but 
mixing  them  causes  chaos  (e.g..  one  mathematician  gives  you  heuristics  for  finding  empirical 


examples  and  generalizing,  while  a  second  gives  you  heuristics  for  formally  axiomatizing  the 
situation;  either  may  suffice,  the  unstructured  mixing  of  the  two  sets  can  be  catastrophic). 

No  treatment  of  heuristics  can  be  complete  without  some  consideration  of  the  space  of  all  the 
world’s  heuristics.  Consider  arranging  them  in  a  gcncralizntion/spccialization  hierarchy,  with  the 
most  general  ones  at  the  top.  At  that  top  level  lie  the  so-called  weak  methods  (generate  &  test,  hill¬ 
climbing,  matching,  means-ends  analysis,  etc.)  At  the  bottom  are  millions  of  very  specific  heuristics, 
involving  domain-specific  terms  like  "King-side’’  and  "DDT".  In  between  arc  heuristics  such  as 
those  illustrated  in  Figure  16.  A  purely  "legal-move”  estimate  of  the  siz.c  of  this  tree  gives  a  huge 
final  number;  Based  on  the  lengths  and  vocabularies  of  heuristic  rules  in  AM,  one  may  suppose 
that  there  arc  about  20  blanks  to  be  filled  in  in  a  typical  heuristic,  and  about  100  possible  entries 
for  each  blank  (predicate,  argument,  action,  etc.)  related  to  AM’s  math  world.  So  there  are  1040 
syntactically  well-formed  heuristics  just  in  the  elementary  mathematics  corner  of  the  tree.  Of 
course,  most  of  these  arc  never  (thankfully!)  going  to  fire,  and  almost  all  the  rest  will  perform 
irrelevant  actions  when  they  do  fire.  From  now  on,  let's  restrict  our  attention  to  the  tree  of  only 
those  heuristics  which  have  positive  utility  at  least  in  some  domains. 

What  docs  that  tree  actually  look  like?  One  can  take  a  specific  heuristic  and  generalize  it  gradually, 
in  all  posible  ways,  until  all  the  generalizations  collapse  into  weak  methods.  Such  a  preliminary 
analysis  led  us  to  expect  the  tree  to  be  of  depth  about  50.  and  in  the  case  of  an  expert  system  with 
a  corpus  of  a  thousand  rules,  we  might  expect  a  picture  of  them  arranged  so  to  form  an  equilateral 
triangle.  But  if  one  draws  the  power  curves  for  the  heuristics,  it  quickly  becomes  apparent  that 
most  generalizations  arc  no  less  powerful  than  the  rulc(s)  beneath  them!  Thus  the  specific  rule  can 
be  eliminated  from  the  tree.  The  resulting  tree  has  depth  of  roughly  3  or  4,  and  is  thus  incredibly 
shallow  and  bushy.  Professors  Herbert  Simon.  Woody  Bledsoe,  and  the  author  analyzed  the  243 
heuristics  from  AM.  and  were  able  to  transform  their  deep  (depth  12)  tree  into  an  equivalent  one 
containing  less  than  fifty  rules  and  having  depth  of  only  four.  Looking  at  a  few  heuristics  arranged 
in  a  tiny  tree  (Fig.  16).  we  can  see  that  all  but  the  top  and  bottom  levels  can  be  eliminated.  A 
similar  phenomenon  was  seen  earlier,  in  the  case  of  a  heuristic  which  said  to  smack  a  vu-graph 
projector  in  case  it  acted  up;  it  and  several  levels  of  its  generalizations  can  be  eliminated,  since  they 
arc  no  more  powerful  than  the  general  "Smack  a  malfunctioning  device”  heuristic.  Some  very 
specific  rule,  such  as  "Smack  a  Chinook  807  vu-graph  projector  on  its  right  side  if  it  hums",  might 
embody  some  new,  powerful,  specific  knowledge  (such  as  the  location  of  the  motor  mount  and  this 
brand’s  tendency  to  misalign),  and  thus  need  to  stay  around. 

This  "shallow-tree’’  result  should  make  advocates  of  weak  methods  happy,  because  it  means  that 
there  really  is  something  special  about  that  top  level  of  the  hierarchy.  Going  even  one  level  down 
means  paying  attention  not  to  an  additional  ten  or  twenty  heuristics,  but  to  hundreds.  It  should 
also  please  the  knowledge  engineering  advocates,  since  most  of  the  very  specific  domain-dependent 
rules  also  had  to  remain.  It  appears,  however,  to  be  a  severe  blow  to  those  of  us  who  wish  to 
automatically  synthesize  new  heuristics  via  specialization,  since  the  result  says  that  that  process  is 
usually  going  to  produce  something  no  more  useful  than  the  rule  you  start  with.  Henceforth,  we 
shall  term  this  the  shallow-tree  problem. 

There  arc  two  ways  out  of  this  dilemma,  however.  Notice  that  "utility  of  a  heuristic"  really  has 
several  distinct  dimensions:  efficiency,  flexibility,  power  for  pedagogical  purposes,  usefulness  in 
future  specializations  and  generalizations,  etc.  Also,  "task  features”  has  several  dimensions:  subject 
matter,  resources  allotted  (user’s  time,  epu  time,  space,  etc.),  degree  of  complexity  (c.g.,  consider 
Knuth’s  numeric  rating  of  his  problems’  difficulty),  time  (i.c..  date  in  history),  paradigm,  etc.  If 
there  are  n  utility  dimensions  and  tn  task  dimensions,  then  there  arc  actually  nxm  different  power 
curves  to  be  drawn  for  each  heuristic.  Fach  of  them  may  resemble  the  canonical  one  pictured  in 
Figure  13.  If  by  specializing  a  heuristic  we  create  one  which  has  the  appearance  of  Figure  14  in 
any  one  of  these  nxm  graphs,  then  it  is  a  useful  specialization.  So.  while  a  specialization  is  unlikely 
to  be  useful  in  any  particular  utility/task  graph,  it  is  quite  likely  to  be  useful  according  to  some  one 
of  the  nxm  such  graphs. 
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Figure  16.  A  tiny  fragment  of  the  graph  of  all  heuristics,  related  by  Generalization/Specialization. 
Note  the  similar  derivation  of  Coalescing  and  Fixed-Points  heuristics. 


Consider  the  Focus  of  Attention  heuristic,  that  is,  one  which  recommends  pursuing  a  course  of 
action  simply  because  it’s  been  worked  on  recently.  Using  this  as  one  reason  to  support  tasks  on  its 
agenda  made  AM  appear  more  intelligent  to  human  observers,  yet  actually  take  longer  to  make  any 
given  discovery.  Thus,  it  is  useful  in  the  "Convincingness”  dimension  of  utility,  but  may  be 
harmful  vis  a  vis  "Efficiency". 

As  another  example,  consider  the  heuristics  "Smack  a  vu-graph  projector  if  it’s  acting  up",  "Smack 
a  child  if  it’s  acting  up",  and  "Smack  a  vu-graph  projector  or  child  if  it’s  acting  up".  There  may  be 
some  utility  dimensions  in  which  the  third  of  those  is  best  (c.g„  scope,  humor).  However  the 
rationale  or  justification  for  the  first  two  heuristics  is  quite  different  (random  perturbation  toward 
stable  state  verms  reinforcement  learning).  Therefore  the  third  heuristic  is  probably  going  to  be 
deficient  along  other  utility  dimensions  (clarity,  usefulness  for  analogizing). 

But  there  is  an  even  more  basic  way  in  which  the  "shallow  tree"  problem  goes  away.  'Hicrc  arc 
really  a  hundred  different  useful  relationships  that  two  heuristics  can  have  connecting  them 
(Possibly-triggcrs,  Morc-restrictivc-IF-part,  Faster,  My-avcrage-powcr-highcr-than-your- peak-power, 
Asks-fewcr-quesiions-of-thc-user.  etc.)  For  each  such  relation,  an  entire  graph  (note  that  even  the 
Genl/Spcc  relation  generated  a  graph,  not  a  tree  -  sec  Fig.  16)  can  be  drawn  of  all  the  world's 
heuristics  (or  all  those  in  some  given  program).  In  some  of  these  trees  or  graphs,  we,  will  find  the 
broad,  shallow  grouping  that  was  found  for  the  AM  heuristics  under  Genl/Spcc.  For  others,  such 
as  Possihly-'l  riggers,  we  may  find  each  rule’ pointing  to  a  small  collection  of  other  rules,  and  hence 


the  depth  would  be  quite  large.  I  here  are  still  many  diffieult  questions  to  study,  even  with  the 
theory  in  this  primitive  state: -c.g..  I  low  docs  the  shape  of  die  tree  (the  graph  of  heuristics  related  by 
some  attribute  R)  relate  to  the  the  ways  in  which  R  ultimately  proves  itself  to  be  useful  or  not 
useful?  Already,  one  powerful  correlation  seems  to  be  recognized:  In  eases  where  the  tree  depth  is 
great,  that  relation  is  a  good  one  to  generalize  and  specialize  along:  in  eases  where  die  resulting  tree 
is  very  broad  and  shallow,  other  methods  (notably  analogy)  may  be  more  productive  ways  of  getting 
new  heuristics. 


6.  Heuristics  used  to  develop  new  heuristics 


6.1  Mela- Heuristics  are  Just  Heuristics 

Assuming  that  "Heuristics"  is  another  field  of  knowledge,  just  like  Electronics  or  Mathematics,  it 
should  be  possible  to  discover  new  ones  and  to  modify  existing  ones  by  employing  a  large  corpus  of 
heuristics.  Is  there  something  special  about  the  heuristics  which  inspect,  gather  data  about,  modify, 
and  synthesize  other  heuristics?  That  is,  should  we  distinguish  ”meta*hcuristics"  from  "domain 
heuristics"?  According  to  our  general  theory,  as  presented  in  the  last  section,  domains  of  knowledge 
arc  granular  but  nearly  continuous  along  every  significant  axis  (complexity  of  task,  amount  of 
quantification  in  die  task,  degree  of  formalization,  etc.)  Thus,  our  first  hypodicsis  should  be  that  it 
is  not  necessary  to  differentiate  meta-level  heuristics  from  object-level  heuristics  --  nay,  that  it  may 
be  artificial  and  counterproductive  to  do  so. 

Figure  17  illustrates  two  heuristics  which  can  deal  with  both  heuristics  and  mathematical  functions. 
The  first  one  says  that  if  some  concept  f  has  always  led  to  bad  results,  then  f  should  be  marked  as 
less  valuable.  If  a  madicmatical  operation,  like  Compose,  has  never  led  to  any  good  new  math 
concepts,  then  this  heuristic  would  lower  the  number  stored  on  the  Worth  slot  of  the  Compose 
concept.  Similarly,  if  a  heuristic,  like  the  one  for  drawing  diagrams,  has  never  paid  off,  then  its 
Worth  slot  would  be  decremented. 

The  second  heuristic  says  that  if  some  concept  has  been  occasionally  useful  and  frequently 
worthless,  then  it's  cost-effective  to  seek  new,  specialized  versions  of  that  concept,  because  some  of 
them  might  be  much  more  frequently  utile  (albeit  in  narrower  domains  of  relevance).  Composition 
of  functions  is  such  a  math  concept  -  it  has  led  AM  to  some  of  its  biggest  successes  and  failures: 
this  heuristic  would  add  a  task  to  AM’s  agenda,  which  said  "Find  new  specializations  of  Compose”. 
When  it  was  eventually  worked  on,  it  could  result  in  the  creation  of  new  functions,  such  as 
"Composition  of  a  function  with  itself’,  "Composition  resulting  in  a  function  whose  domain  and 
range  arc  equal",  "Composition  of  two  functions  which  were  derived  in  the  same  way",  etc.  "Ill is 
second  heuristic  also  applies  to  heuristics,  in  fact  it  applies  to  itself  It  itself  is  sometimes  useful  and 
sometimes  not,  and  so  frequently  it  truly  docs  pay  to  seek  new,  specialized  variations  of  that 
heuristic.  Four  possible  specializations  arc,  for  example,  heuristics  which  demand  that  f  has  proven 
itself  useful  at  least  J  times,  that  f  be  specialized  in  an  extreme  way,  that  f  have  proven  itself 
extraordinarily  useful  at  least  once,  and  that  the  specializations  still  be  capable  of  producing  any  of 
the  successful  past  creations  of  f. 


IF  the  results  of  performing  f  have  always  been  numerous  and  worthless, 
THEN  lower  the  expected  worth  of  f 

IF  the  results  of  performing  f  are  only  occasionally  useful, 

THEN  consider  creating  new  specializations  of  f 


Figure  17.  Two  heuristics  capable  of  working  on  heuristics  as  well  as  math  concepts 


6.2  Attributes  of  a  Heuristic 

In  AM,  heuristics  examine  existing  frame-like  concepts,  and  lead  to  new  and  different  concepts.  To 
have  heuristics  operate  on  and  produce  heuristics,  it  suffices  to  represent  each  heuristic  as  a  full- 
tlcdgcd  frame-like  concept.  Thus,  the  first  heuristic  listed  in  Figure  17  needs  to  reset  the  value  of 
the  Worth  slot  of  die  concept  f  it  operates  on.  and  even  if  f  is  a  heuristic  it  must  have  a  Worth  slot 
Similarly,  a  heuristic  that  referred  to  such  slots  as  Avcrage-running-timc,  Date-created,  Is-a-kind-of, 
Number-of-initances,  etc.  could  only  operate  upon  units  (be  they  mathematical  functions  or 
heuristics)  having  such  slots.  Figure  18  illustrates  (some  of  the  slots  from)  a  heuristic  represented  in 
that  way  Notice  its  similarity  to  the  representation  of  a  mathematical  operation  (Figure  19).  The 
heuristic  resembles  the  function  (compare  Figs  18-19)  much  more  than  the  math  function  resembles 
the  static  math  concept  (compare  Figs  19-20). 

Earlier,  we  defined  a  heuristic  to  be  a  contingent  piece  of  guidance  knowledge:  In  some  situation, 
here  arc  some  actions  that  may  be  especially  fruitful,  and  here  arc  some  that  may  be  extremely 
inappropriate.  While  some  heuristics  have  pathological  formats  (e.g.,  algorithms  which  lack 
contingency:  delta  function  spikes  which  can  be  succintly  represented  as  tables),  most  heuristics 
seem  to  be  naturally  stated  as  mlcs  having  the  format  "IF  conditions.  THEN  actions."  As  the  body 
of  heuristics  grows,  the  conditions  fall  into  a  few  common  categories  (testing  whether  the  rule  is 
potentially  relevant,  testing  whether  there  arc  enough  available  resources  to  expect  the  rule  to  work 
successfully  to  completion,  etc.)  and  so  do  the  actions  (add  new  tasks  to  the  agenda,  print 
explanatory  messages,  define  new  concepts,  etc.)  Each  of  these  categories  is  worth  making  into  a 
separate  named  attribute  which  heuristic  rules  can  possess;  Sections  6.3  and  7  will  show  the  power 
which  can  arise  from  drawing  such  distinctions.  So  instead  of  a  heuristic  having  an  IF  slot  and  a 
THEN  slot,  it  will  have  a  bundle  of  slots  which  together  comprise  the  conditions  of  applicability  of 
the  heuristic,  and  another  bundle  of  slots  which  comprise  the  actions.  See  Figure  18.  It  is  also 
worth  defining  compound  slots  in  terms  of  these:  a  composite  IF  part,  a  :.mpositc  THEN  part,  a 
combined  IF/THEN  lump  of  lisp  code,  a  compiled  version  of  the  same,  etc. 

All  the  previous  attributes  have  been  effective,  executable  conditions  and  actions.  These  arc 
paramount,  since  they  serve  to  define  the  heuristic  --  they  are  the  criterial  slots.  Many  non-effective 
non-critcrial  slots  are  important  as  well,  for  describing  the  heuristic.  Some  of  these  relate  the 
heuristic  to  other  heuristics  (Generalizations,  Specializations,  classes  of  heuristics  (Isa),  and  non- 
heuristic  concepts  (View.)  Several  slots  record  its  origins  (Dcfincd-using,  Creation-date)  and  the 
case  studies  of  its  uses  so  far  (Examples). 

Once  a  rich  stock  of  slots  (types  of  attributes)  is  present  for  heuristics,  several  new  ones  can  be 
derived  from  them  in  two  ways.  First,  one  can  take  a  slot  and  ask  some  questions  about  it:  how 
docs  it  evolve  over  time  in  Icngf'1,  what  relationships  exist  among  entries  that  fill  it.  how  useful  arc 
those  values,  etc.  Each  such  question  spawns  a  new  kind  of  slot  (AvgNumbcrOfExtrcmcExamples, 
RclnsAmongMyExtrcmcExamples,  AvgWorthOfExtrcmcExampIcs).  Second,  one  can  take  a  pair  of 
slots  (say  ThcnConjccturc  and  If-Truly-Rclcvant)  and  a  relation  (such  as  Implies)  and  define  a  new 
unary  function  on  heuristics  --  a  new  kind  of  slot  that  any  heuristic  can  have  -  where  Ml  would  list 
H2  as  an  entry  on  that  slot  only  if  (in  the  present  case)  the  ThcnConjccturc  slot  of  HI  Implies  the 
I fTruly  Relevant  slot  of  H2.  A  good  name  for  this  new  slot  might  be  "Can  Trigger”,  because  it  lists 
some  heuristics  which  might  trigger  when  HI  is  fired.  Of  course  not  all  of  the  n2  "cross-term”  type 
slots  arc  going  to  be  useful,  but  this  provides  a  generator  for  a  large  space  of  potentially  worthwhile 
new  slots.  Some  heuristics  can  guide  the  system  in  selecting  plausible  ones  to  define,  monitoring 
the  utility  of  each  selection,  and  obliterating  any  which  appear  empirically  rarely  to  lead  to  any 
significant  future  solutions  or  discoveries.  An  example  of  such  a  process  is  given  in  Section  7. 
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NAME:  Generalize-rare-predicate 

ABBREVIATION:  GRP 
STATEMENT 

English:  If  a  predicate  is  rarely  true.  Then  create  generalizations  of  it 
IF-just-finished-a-task-dealing-with:  a  predicate  P  \  these  3  attributes  comprise 
IF-about-to-work-on-task-dealing-with:  an  agenda  A  |—  if-potentially-relevant 
IF-in-the-middle-of-a-task-dealing-with:  *never*  / 

IF-tru ly-relevant:  P  returns  True  less  than  5%  of  Average  Predicate 
IF-resources-availabie:  at  least  10  cpu  seconds,  at  least  300  cells  ' 
THEN-add-task-to-agenda:  Fill  in  entries  for  Generalizations  slot  of  P 
THEN-conjecture:  P  is  less  interesting  than  expected 

Generalizations  of  P  may  be  better  than  P 
Specializations  of  P  may  be  very  bad 
THEN-modify-slots:  Reduce  Worth  of  P  by  10% 

Reduce  Worth  of  Specializations(P)  by  50% 

Increase  Worth  of  General izations(P)  by  20% 
THEN-print-to-user:  English(GRP)  with  ”a  predicate"  replaced  by  P 

THEN-define-new-concepts: 

CODED-IF-PART:  A(P)  ...  <LISP  function  definition  omitted  here> 

CODED-THEN-PART:  A(P)  ...  <LISP  function  definition  omitted  hcre> 

CODED-1F-THEN-PARTS:  A(P)  ...  <LISP  function  definition  omitted  hcre> 

COMPILED-CODED-IF-THEN-PARTS:  #30875 
SPECIALIZATIONS:  Generalize-rare-set-predicate 
Boundary-Specializations:  Enlarge-domain-of-predicate 
GENERALIZATIONS:  Modify-predicate,  General ize-concept 

Immediate-Generalizations:  Generalize-rare-contingent-piece-of-knowledge 

Siblings:  Generalize-rare-heuristic 

IS- A:  Heuristic 

EXAMPLES: 

Good-Examples:  Generalize  Set-Equality  into  Same-Length 
Bad-Examples:  Generalize  Set- Equality  into  Same-First- Element 
CONJECTURES:  Special  cases  of  this  are  more  powerful  than  Generalizations 
Good-Conjec-Units:  Specialize,  Generalize 
ANALOG  1ES:  Weaken-overconstrained-problem 
WORTH:  600 


VIEW:  En large-structure 

ORIGIN:  Specialization  of  Modify-predicate  via  empirical  induction 
Defincd-using:  Specialize 
Creation-date:  6/1/78  11:30 
HISTORY: 


NGoodExamples:  1 

NGoodConjectures:  3 
NGoodTasks-addcd:  2 
AvgCpuTime:  9.4  seconds 


NBadExnmples:  1 
NBadConjectures:  1 
NBacI TasksAdded:  0 

AvgListCells:  200 


Figure  18.  Frninc-likc  representation  for  .1  heuristic  rule  from  AM.  'Hie  rule  is  composed  of 
nothing  hut  attribute: value  pairs.  Alter  each  attribute  or  slot  (often  heavily  hyphenated)  is  a  colon, 
and  then  a  list  of  the  entries  or  value;;  for  that  altrihute  of  the  GRP  heuristic. 


NAME:  Compose 

ABBREVIATION:  -  o  - 
STATEMENT 


English:  Compose  two  functions  F  and  G  into  a  new  one  FoG 
DOMAIN:  F,  G  are  functions  | —  ii'-porr-NTi/U-Ly-RKLEVANT 

IF-truly-relevant:  Domain  of  F  and  Range  of  G  have  some  intersection 

IF-resources-available:  at  least  2  cpu  seconds,  at  least  200  cells 
THEN-add-task-to-agenda:  Fill  in  entries  for  some  slots  of  FoG 
THEN-conjecture:  Properties  of  F  hold  for  FoG 

Properties  of  G  hold  for  FoG 

THEN-modify-slots:  Record  FoG  as  an  example  of  Compose 
TH  EN-print-to-user:  English(Compose) 

THEN-define-new-concepts:  Name  FoG; 

ORIGIN  Compose  F,G; 

WORTH :  Average(Worth(F),Worth(G)) 

DEFN:  Append(Defn(G),Defn(F)) 
Avg-cpu-time:  Pius(Avg-cpu(F),Avg-cpu(G)) 
IF-Potentially-Rele:  If-Potentially-Rele(G) 
IF-Truly-Rele:  If-Truly-Rele(G) 

CODED- 1 F-P  ART:  A(F,G)  ... 

CODED-THEN-PART:  ,\(F,G)  ... 

CODED- 1 F-THEN-PARTS:  X(F.G)  ... 
COMPILED-CODED-1F-THEN-PARTS:  #30876 
SPECIALIZATIONS:  Composition-of-bijections 
GENERALIZATIONS:  Combine-concepts 

Immediate-Generalizations:  Combine-functions 

IS-A:  Function 
EXAMPLES: 

Good- Examples:  Compose  Count  and  Divisors 
Bad-Examples:  Compose  Count  and  Count 
CONJECTURES:  Composing  F  and  F  is  sometimes  very  good  and  usually  bad 
ANALOGIES:  Sequence 
WORTH:  700 


VIEW:  Append 

ORIGIN:  Specialization  of  Appcnd-concepts  with  s!ot=Defn 
Defined-Using:  Specialize 
Creation-date:  11/4/75  03:18 
HISTORY: 


NGoodExamples:  14 
NGoodConjecturcs:  2 
NGoodTasks-added:  57 
AvgCpuTime:  1.4  seconds 


NBadExamples:  19 

NBadConjectures:  1 
NBadTasksAddcd:  34 
AvgListCells:  160 


Figure  19.  Frame-like  representation  for  a  mathematical  function  from  AM. 


NAME:  Primes 
STATEMENT 

English:  Numbers  with  two  divisors 
SPECIALIZATIONS:  Odd-primes,  Small-primes,  Pair-primes 
GENERALIZATIONS:  Positive  numbers 
IS-A:  Class  of  numbers 
EXAMPLES: 

Extreme-exs:  2,3 
Extreme-non-exs:  0,1 
Typical-exs:  5,7,11,13,17,19 
Typical-non-exs:  34,  100 
CONJECTURES: 

Good-conjecs:  Unique-factorization,  Formula-for-d(n) 

Good-conjec-units:  Times,  Divisors-of,  Exponentiate,  Nos-with-3-divis,  Squaring 
ANALOGIES:  Simple  Groups 
WORTH:  800 

ORIGIN:  Divisors-of1  (Doubletons) 

Defined-using:  Divisors-of 
Creation-date:  3/19/76  18:45 
HISTORY: 

NGoodExamples:  840 
NBadExamples:  5000 
NGoodConjectures:  3 
NBadConjectures:  7 

Figure  20.  Frame-like  representation  for  a  static  mathematical  concept  from  AM. 


6.3  Discovering  a  New  Heuristic 

The  AM  heuristics  create  new  concepts  via  specializing  existing  ones,  generalizing  (either  from 
existing  ones  or  f;om  newly-gathered  data),  and  analogizing,  'these  arc  the  three  "directions"  new 
heuristics  will  come  from.  We  have  exemplified  specialization  already.  One  point  about 
generalization  is  worth  making:  Heuristics  which  serve  as  plausible  move  generators  originate  by 
generalizing  from  past  successes;  heuristics  which  prune  away  implausible  moves  originate  by 
generalizing  from  past  failures.  Since  successes  are  much  less  common  than  failures,  it  is  not 
surprising  that  most  heuristics  in  most  heuristic  search  programs  arc  of  the  pruning  variety.  In  fact, 
many  authors  define  heuristic  to  mean  nothing  more  than  a  pruning  aid. 

One  of  the  typical  "common  sense  number  theory"  heuristics  which  AM  lacked  was  the  one  which 
decides  that  the  unique  factorization  theorem  is  probably  more  significant  than  Goldbach’s 
conjecture,  because  the  first  has  to  do  with  multiplication  and  division,  while  the  latter  deals  with 
addition  and  subtraction,  and  Primes  is.  inherently  tied  up  with  the  former  operations.  How  could 
such  a  heuristic  be  discovered  automatically?  This  is  the  starting  point  for  the  example  we  now 
begin,  an  example  which  concludes  in  the  following  section.  "Heuristics  to  develop  new 
representations".  Why  should  this  he  so?  That  is.  what  in  the  world  docs  discovering  heuristics 
have  to  do  with  representation  of  knowledge? 

The  connection  between  heuristics  and  representation  is  profound.  Consider  even  the  special  case 
where  we  restrict  our  representations  to  frame-like  ones.  The  larger  the  number  of  different  kinds 


of  slots  that  arc  known  about,  the  fewer  keystrokes  are  required  to  type  a  given  frame  (concept, 
unit)  in  to  the  system.  Thus,  if  NGoodConjecs  weren’t  known,  it  might  take  40  keystrokes  rather 
than  1  to  assert  that  there  were  3  good  conjectures  known  involving  prime  numbers.  Moreover,  no 
special-purpose  machinery  to  process  such  an  assertion  would  be  known  to  the  system. 

This  is  akin  to  the  power  Inccriisp  derives  from  the  thickness  of  its  manual,  from  the  huge  number 
of  useful  predefined  functions.  A  broad  vocabulary  streamlines  communication.  Not  only  docs  a 
profusion  of  slot  types  facilitate  entering  a  concept,  it  makes  it  easier  to  modify  it  once  it’s  entered. 
Finally,  it  makes  it  easier  to  discover  it  in  the  first  place;  think  of  it  as  combining  terms  in  a  more 
powerful,  higher  level'  language. 

So  we  see  that  the  task  of  discovering  heuristics  should  be  profoundly  accelerated  -*  or  retarded  ** 
by  the  choice  of  slots  wc  make  for  our  representation.  In  the  case  of  an  excellent  choice  of  slots,  a 
new  heuristic  would  frequently  be  simply  a  new  entry  on  one  slot  of  some  concept.  Let’s  sec  how 
that  can  be. 

Recall  that  primes  were  originally  discovered  by  the  system  as  extrema  of  the  function  "Divisors- 
of.  This  was  recorded  by  placing  the  entry  "Divisors-of  in  the  slot  called  "Defincd-using"  on  the 
concept  called  "Primes"  (sec  Figure  20).  Later,  conjectures  involving  Primes  were  found, 
cmpiricially-obscrvcd  patterns  conncccting  Primes  with  several  other  concepts,  such  as  Times, 
Divisors-of,  Exponentiation,  and  Numbcrs-with-3-divisors.  This  is  recorded  on  the 
GoodConjccUnits  slot  of  the  Primes  concept.  Notice  that  all  the  entries  on  Primes’  DcfincdUsing 
slot  arc  also  entries  on  its  GoodConjccUnits  slot.  This  recurred  several  times,  that  is  for  several 
concepts  besides  Primes,  and  ultimately  tne  heuristic  H9  (below)  became  relevant  (its  IF-part 
became  satisfied); 

H9:  IF  (for  many  units  u)  most  of  the  entries  on  u.r  arc  also  entries  on  u.s, 
THEN-ASSERT  that  r  is  a  subslot  of  s  (with  justification  H9) 

This  heuristic  said  that  it  would  probably  be  productive  to  pretend  that  DcfincdUsing  was  always  a 
subslot  of  GoodConjccUnits.  thus,  as  soon  as  you  define  a  new  concept  X  in  terms  of  Y,  you 
should  expect  there  to  be  some  interesting  conjectures  between  X  and  Y.  This  new  expectation  is  a 
new  heuristic;  in  our  old,  cumbersome  IF/THEN  language  wc  might  express  it  by  two  rules  saying: 

(A)  "IF  a  concept  is  created  with  a  value  in  its  DcfincdUsing  slot, 

THEN  place  that  value  in  its  GoodConjccUnits  slot,  with  justification  H9." 

(13)  "IF  Y  is  an  entry  on  the  GoodConjccUnits  slot  of  X.  but  no  good  conjecture  between  X  and 
Y  is  yet  known,  THEN  propose  a  task  for  die  agenda,  to  look  for  conjectures  between  X  and  Y." 

The  second  of  these,  (13),  has  nothing  to  do  with  DcfincdUsing  slots.  In  fact,  it  is  really  no  more 
powerful  than  a  combination  of  (i)  a  very  general  rule  that  says  to  verify  suspected  members  of  any 
given  slot,  and  (ii)  enough  facts  about  GoodConjccUnits  and  Conjectures  to  know  how  to  apply  (i) 
to  them.  The  first  one,  (A),  is  the  "new  heuristic"  synthesized  by  H9.  It  needn’t  be  represented  as 
shown  above;  rather,  wc  can  simply  go  to  the  concept  called  DcfincdUsing  (the  data  structure 
which  holds  all  the  information  the  program  knows  about  that  kind  of  slot  in  general),  and  record 
that  one  of  its  Superslots  is  GoodConjccUnits.  Wc  should  also  give  this  an  explicit  justification, 
namely  119.  since  it  is  a  heuristic,  not  a  fact.  Figure  21  shows  what  this  record  looks  like  in  our 
current  program.  The  new  heuristic  is  simply  the  line  or  two  emboldened  below;  all  the  non-bold 
text  was  present  in  the  program  already  (though  most  of  it  was  written  by  the  program  itself  at 
earlier  times,  not  filled  in  by  human  hands). 

It  is  important  to  make  clear  that  the  semantics  of  a  value  v  appearing  as  an  entry  on  slot  s  of 
concept  c  does  not  necessarily  mean  that  it  is  formally  proven  that  v  merits  a  position  there;  rather, 
it  is  merely  plausible.  Any  entry  v  can  have  an  explicit  justification,  but  in  lieu  of  any  information 
to  the  contrary,  the  default  justification  is  merely  empirical.  Thus,  when  an  entry,  say  Palindromes, 
is  on  the  GoodConjccUnits  slot  of  Primes,  it  may  mean  that  some  interesting  conjectures  have  been 
found  between  Primes  and  Palindromes,  or  just  that  it  is  suspected  -  and  o.xpectcd  **  that  such 
conjectures  can  be  found  if  one  spends'  the  trouble  looking  for  them. 


Thanks  to  the  large  number  of  useful  specialized  slots,  large  IF*  THEN-  rules  can  be  compactly, 
conveniently,  efficiently  represented  as  simple  links.  Some  of  these  useful  slots  arc  very  general, 
but  many  arc  domain  dependent.  Thus,  as  new  domains  of  knowledge  emerge  and  evolve,  new 
kinds  of  slots  must  be  devised  if  this  powerful  property  is  to  be  preserved.  The  next  natural 
question  is,  therefore.  "Mow  can  useful  new  slots  be  found?"  The  last  two  sentences  are  the  final 
two  points  of  our  original  five-point  programme  (Figure  2),  and  die  next  section  answers  them  by 
way  of  continuing  the  example  we've  begun  in  this  section. 


NAME:  Archetypical-"Defined-Using”-slot 
SPECIALIZATIONS: 

SubSlots:  Really-Defined-Using,  Coiild-Have-Defined-Using 
GENERALIZATIONS: 

SuperSlots:  Origin,  GoodConjecUnits 

Justification:  H9 

IS-A:  Kind  of  slot 
WORTH:  300 

ORIGIN:  Specialization  of  Origin 

Defined-using:  Specialize 
Creation-date:  9/18/79  15:43 
AVERAGE-SIZE:  1 
FORMAT:  Set 
FILLED-WITH:  Concepts 

CACHE?  Always-Cache 
MAKES-SENSE-FOR:  Concepts 

Figure  21.  Part  of  the  concept  containing  centralizing  knowledge  about  all  DefinedUsing  slots. 


7.  Heuristics  used  to  develop  new  representations 

The  example  here  shows  how  new  kinds  of  slots  can  be  discovered  and  used  to  advantage.  This  is 
just  an  extension  of  a  given  representation,  rather  than  true  exploration  in  "the  space  of  all 
representations  of  knowledge”.  I  believe  the  latter  will  someday  be  possible,  using  nothing  more 
than  a  body  of  heuristics  for  guidance,  but  we  do  not  yet  have  enough  experience  to  formulate  the 
necessary  rules. 

Fach  kind  of  representation  makes  some  set  of  operations  efficient,  often  at  the  expense  of  other 
operations.  Thus,  an  exploded-view  diagram  of  a  bicycle  makes  it  easy  to  see  which  parts  touch 
each  other,  sequential,  verbal  instructions  make  it  easy  to  assemble  the  bicycle,  an  axiomatic 
formulation  makes  it  easy  to  prove  properties  about  it,  etc. 

As  a  field  matures,  its  goals  vary,  its  paradigm  shifts,  the  questions  to  investigate  change,  the 
heuristics  and  algorithms  to  bring  to  bear  on  those  questions  evolve.  Therefore,  the  utility  of  a 
given  representation  is  bound  to  vary  both  from  domain  to  domain  and  within  a  domain  from  time 
to  time,  much  as  did  that  of  a  given  corpus  of  heuristics,  flic  representation  of  today  must  adapt 
or  give  way  to  a  new  one  --  or  die  field  itself  is  likely  to  stagnate  and  be  supplanted. 

Where  do  these  new  representations  come  from?  The  most  painless  route  is  to  merely  select  a  new 
one  from  the  stock  of  existing  representational  schemes.  Choosing  an  appropriate  representation 
means  picking  one  which  lets  you  quickly  carry  out  the  operations  you're  now  going  to  carry  out 


most  frequently. 


In  ease  there  is  no  adequate  existing  representation,  you  may  try  to  extend  one,  or  devise  a  whole 
new  one  (good  luck!),  or  (most  frequently)  simply  employ  a  set  of  known  ones,  whose  union  makes 
all  the  common  operations  fast.  Thus,  when  I  buy  a  bicycle,  I  expect  both  diagrams  and  printed 
instructions  to  be  provided.  The  carrying  along  of  multiple  representations  simultaneously,  and  the 
concommitant  need  to  shift  from  one  to  another,  has  not  been  much  studied  -  or  attempted  --  in 
A I  to  date,  except  in  very  tiny  worlds  (c.g.,  the  Missionaries  &  Cannibals  puzzle). 

There  are  several  levels  at  which  "new  representations"  can  be  found.  At  the  lowest  level,  one  may 
say  that  AM  changed  its  representation  every  time  it  defined  a  new  domain  concept  or  predicate, 
thereby  changing  its  vocabulary  out  of  which  new  ones  could  be  built. 

Much  more  significant  would  be  the  definition  of  new  kinds  of  slots,  typically  ones  specific  to  ~ 
and  very  useful  for  **  some  newly-discovered  field  of  knowledge.  For  instance,  when  AM  found 
the  unique  factorization  conjecture,  it  would  have  been  good  if  it  had  defined  a  new  kind  of  slot. 
Prime- Factors,  that  every  number  could  have  had.  A  rule  capable  of  this  second-level 
representation  augmentation  is  the  following  one: 

IF  the  average  size  of  s  slots  is  large, 

THEN  propose  a  new  task:  replace  s  by  new  specializations  of  s. 

The  vague  terms  in  the  rule  would  have  specific  computational  interpretations,  of  course;  for 
instance,  "large"  might  mean  ">10",  or  ">  3  times  the  average  size  of  all  slots”,  or  "larger  than  any 
other  slot",  or  (most  useful  from  a  computational  efficiency  viewpoint)  "larger  than  the  average 
number  of  slots  a  unit  has".  It  might  cause  the  Examples  slot  to  be  broken  into  several  subslots, 
such  as  ExtrcmcExampIcs,  TypicalExamplcs.  Boundary  Examples,  etc.  It  might  cause  Factors  to  be 
split  up  into  PrimcFactors,  LargcFactors,  etc.  Note  that  the  subslots  will  not  in  general  be  disjoint. 

The  third  and  final  level  at  which  "new  representations"  can  be  interpreted  is  to  actually  shift  from 
one  entire  scheme  to  another  -  perhaps  novel  --  one.  The  following  two  rules  indicate  when  a 
certain  type  of  shift  is  appropriate: 

IF  the  problem  is  a  geometric  one, 

THEN  draw  a  diagram. 

IF  most  units  have  most  of  their  possible  slots  filled  in, 

THEN  shift  from  property  lists  to  record  structures. 

All  the  heuristics  of  this  type  arc  specializations  of  the  general  one  which  says  IF  some  operation  is 
performed  frequently.  THEN  shift  to  a  representation  in  which  it  is  very  inexpensive  to  perform. 

Let  us  continue  our  example.  Here  is  a  heuristic  which  is  capable  of  reacting  to  a  situation  by 
defining  an  entirely  new  slot,  built  up  from  old  ones,  a  new  slot  which  it  expects  will  be  useful: 

H10:  IF  a  slot  s  is  very  important,  and  all  its  values  are  units, 

TH EN-CREATE-N  EW- K I N D-0 F*S LOT  which  contains  "all  the  relations 

among  the  values  of  my  s  slot" 

When  the  number  stored  in  the  Worth  slot  of  the  GoudConjecUnits  concept  is  large  enough,  the 
system  attends  to  the  task  of  explicitly  studying  GoodConjec Units.  Several  heuristics  arc  relevant 
and  fire:  among  them  is  1110,  the  rule  shown  above.  It  then  synthesizes  a  whole  new  unit,  calling  it 
RclationsAmongHntriesOnMy"GoodConjccUnits"Slol.  Every  known  way  in  which  entries  on  the 
GoodConjecUnits  slot  of  a  concept  C  relate  to  each  other  will  be  recorded  on  this  new  slot  of  C. 

For  instance,  take  a  look  at  the  Primes  concept  (Figure  20).  Its  GoodConjecUnits  slot  contains  the 
following  entries:  limes.  l)ivisors-of.  Exponentiation.  Squaring,  and  Numbcrs-wiilwhrec-divisors. 
The  first  two  of  these  entries  arc  inverses  of  each  others;  that  is,  if  you  look  over  the  Times  unit. 


you  will  see  a  slot  called  Inverse  which  is  filled  with  names  of  concepts,  including  Times.  Similarly, 
still  looking  over  the  Times  unit,  one  can  see  a  slot  called  Repeat  which  is  filled  with  the  entry 
Exponentiation,  and  one  can  see  a  slot  called  Compose  filled  with  Squaring.  So  Inverse  and  Repeat 
and  Compose  arc  some  of  the  relations  connecting  entries  on  the  GoodConjccUnits  slot  of  Primes, 
hence  the  program  will  record  Inverse  and  Repeat  and  Compose  as  three  entries  on  the 
RclaiionsAmongKntricsOiiMy"GoodConjccUnits"Slot  slot  of  the  Primes  concept. 

Now  it  so  happens  that  several  concepts  wind  up  with  "Compose"  and  "Inverse"  as  entries  on  their 
RelaiionsAmongEntricsOnMy"GoodConjccUnits"SIot  slot.  The  alert  readermay  suspect  that  this  is 
no  accident,  and  an  alert  program  should  suspect  that,  too.  Indeed,  the  following  heuristic  says  that 
it  might  be  useful  to  behave  as  if  "Compose"  and  "Inverse"  were  always  going  to  eventually  appear 
there: 

Hll:  IF  (for  many  units  u )  the  s  slot  of  u  contains  the  same  values  vit 

THEN-ADD- VALUE  v-x  to  the  ExpectedEntries  slot  of  the  Typical- s-slol  unit. 

This  causes  the  program  to  add  Compose  and  Inverse  to  the  slot  called  ExpectedEntries  of  the 
concept  called  Re!ationsAmongHntricsOnMy"GoodConjccUnits"Slot.  This  one  small  act,  the 
creation  of  a  pair  of  links,  is  in  effect  creating  a  new  heuristic  which  says: 

IF  a  concept  gets  entries  X  and  Y  on  its  GoodConjccUnits  slot, 

THEN  predict  that  it  will  get  Invcrsc(X),  Invcrsc(Y),  and  Composc(X.Y)  there  as  well. 

How  is  this  actually  used?  Consider  what  occurs  when  the  program  defines  a  new  concept,  C, 
which  is  DefinedUsing  Divisors-of.  As  soon  as  that  concept  is  formed,  the  heuristic  link  from 
DefinedUsing  to  GoodConjccUnits  automatically  fills  in  Divisors-of  as  an  entry  on  the 
GoodConjccUnits  slot  of  C.  Next,  the  links  just  illustrated  above  come  into  action,  and  place 
Inverse  and  Compose  on  the  RclationsAmongEntriesOmVly"GoodConjccUnits"Slot  slot  of  C.  That 
in  turn  causes  the  inverse  of  Divisors-of.  namely  Times,  to  be  placed  on  the  GoodConjccUnits  slot 
as  well  as  the  already-prcscnt  entry,  Divisors-of.  Finally,  that  causes  the  program  to  go  off  looking 
for  conjectures  between  C  and  either  multiplication  or  division.  When  a  conjecture  comes  in 
connecting  C  to  one  of  them,  it  will  get  a  higher  a  priori  estimated  worth  than  one  which  doesn’t 
connect  to  them. 

If  only  we’d  had  die  new  heuristics  back  when  Primes  was  first  defined,  they  would  have  therefore 
embodied  enough  "common  sense"  to  prefer  die  Unique  Factorization  'Theorem  to  Goldbach’s 
conjecture.  If  we’d  had  them  earlier,  these  heuristics  would  have  led  us  to  our  present  state  much 
sooner.  Because  of  our  assumptions  about  the  continuity  of  the  world,  such  heuristics  should 
nevertheless  be  useful  from  time  to  time  in  the  future. 

Notice  that  dicrc’s  nothing  special  about  mathematics  --  the  newly  syndicsized  heuristics  have  to  do 
with  very  general  slots,  like  DefinedUsing  and  GoodConjccUnits.  For  instance,  as  soon  as  a  new 
concept  (say  Middle-Class)  is  DefinedUsing  Income,  the  program  immediately  fills  in  the  following 
underlined  information: 

NAME:  Middle-Class 

Dcfincd-tising:  Income 

Rcla(ioiisAmongT’ntricsOiiMy"GoodConjecUnits"Slot:  Inverse,  Compose 
Good-Conjcc-Units:  Income,  Spending.  Earned  I  merest 

'Thus,  it  goes  off  looking  for  (and  will  expect  more  from)  conjectures  between  Middle-Class  and  any 
of  Income,  Spending,  and  Earnedl Merest.  'Thus  the  new  slot  is  useful,  though  it  has  a  terrible 
name,  and  the  new  little  heuristics  (which  looked  like  little  links  or  facts  but  were  actually 
permission  lo  make  daring  guesses)  were  powerful  aller  all. 

We  have  relied  heavily  on  our  representation  being  very  structured:  in  a  very  uniform  one  (say  a 
calculus  of  linear  propositions,  with  the  only  operations  being  Assert  and  Match)  it  would  be 


difficult  to  obtain  enough  empirical  data  to  easily  modify  that  representation.  'This  is  akin  to  the 
nature  of  discovering  domain  facts  and  heuristics:  if  the  domain  is  too  simple,  it’s  harder  to  find 
new  knowledge  and  *•  in  particular  -  new  heuristics.  Heuristics  for  propositional  calculus  arc  much 
fewer  and  weaker  than  those  available  for  guiding  work  in  predicate  calculus:  they  in  turn  pale 
before  the  rich  variety  available  for  guiding  theorem  proving  "the  way  mathematicians  really  do  it". 
This  is  an  argument  for  attacking  sccmingly-difTicult  problems  which  turn  out  to  be  lush  with 
structure,  rather  than  working  in  worlds  so  constrained  that  their  simplicity  has  sterilized  them  of 
heuristic  structure. 


8.  Conclusions 


We  began  by  noting  that  the  limiting  step  in  the  construction  of  expert  systems  was  building  the 
knowledge  base,  and  that  one  solution  would  be  for  the  program  itself  to  automatically  acquire  new 
knowledge:  to  learn  via  discovery. 

The  heuristic  search  paradigm  seems  adequate  to  guide  a  program  in  formulating  useful  new 
concepts,  gathering  data  about  them,  and  noticing  relationships  connecting  them.  However,  as  the 
body  of  domain-specific  facts  grows,  the  old  set  of  heuristics  becomes  less  and  less  relevant,  less  and 
less  capable  of  guiding  the  discovery  process  effectively.  New  heuristics  must  also  be  discovered. 

Since  heuristics  is  a  domain  of  knowledge,  much  like  any  other,  one  can  imagine  an  expert  system 
that  works  in  that  field.  That  is,  a  corpus  of  heuristics  can  grow  and  improve  and  gather  data  about 
itself.  This  process  is  very  slow  and  explosive,  yet  it  can  be  greatly  facilitated  by  having  "the  right 
representation".  In  the  ease  of  a  schematized  representation,  this  means  having  the  right  set  of  slots 
or  attributes,  the  right  set  of  attached  procedures,  etc.  We  saw  how  heuristics  can  lead  to  the 
development  of  useful  new  kinds  of  slots,  to  improved  representations  of  knowledge.  It  was 
hypothesized  that  the  same  representation  we  use  for  attributes  and  values  of  object-level  concepts 
could  also  be  used  to  represent  heuristics  and  even  to  represent  representation.  To  draw  some 
examples  from  the  RLL  system  [I.enat  &  Greiner  80J:  Primes  (a  set  of  numbers), 
GcncralizcRarcPrcdicatc  (a  heuristic),  GcncralizcRarcHcuristic  (a  meta-heuristic),  and  Isa  (a 
representation  concept)  arc  all  represented  adequately  as  units  with  slots  having  values.  A  single 
interpreter  runs  both  meta-heuristics  and  heuristics,  and  is  itself  represented  as  a  collection  of  units. 
While  meta-heuristics  could  be  tagged  to  distinguish  them  from  heuristics,  the  utility  of  doing  so 
rests  on  the  existence  of  rules  which  genuinely  treat  them  differently  somehow  -  and  such  rules 
have  not  to  date  been  encountered. 

One  of  the  necessary  steps  in  this  research  was  the  explication  of  at  least  a  rudimentary  theory  of 
heuristics,  an  analysis  of  their  innate  source  of  power,  their  nature.  This  turned  out  to  rest  upon  the 
continuity  of  our  world:  if  the  situation  is  very  similar,  so  is  the  set  of  (in)appropriate  actions  to 
take.  Corollaries  of  this  provide  the  justification  for  the  use  of  analogy  and  even  for  the  utility  of 
memory.  The  central  assumption  was  seen  to  be  just  that  —  an  assumption  which  is  often  false  in 
small  ways,  but  which  is  nevertheless  a  useful  fiction  to  be  guided  by. 

By  graphing  the  power  curves  of  a  heuristic  (the  utility  of  that  heuristic  as  a  function  of  task  being 
worked  on),  we  were  able  to  sec  the  gains  -*  and  dangers  -  of  specializing  and  generalizing  them  to 
get  new  ones.  Such  curves  determine  a  preferred  order  for  obeying  relevant  heuristics,  and  suggest 
several  specific  new  attributes  worth  measuring  and  recording  for  each  heuristic  (e.g.,  the  sharpness 
with  which  it  flips  from  useful  to  harmful,  as  one  leaves  its  domain  of  relevance). 

By  arranging  all  the  world’s  heuristics  (well,  at  least  all  of  AM’s,  and  several  more  randomly-chosen 
ones  from  chess,  biology,  and  oil  spills)  into  a  hierarchy  using  the  relation  "Morc-Gencral-Than”, 
we  were  surprised  to  find  that  hierarchy  very  shallow,  thereby  implying  that  analogy  would  be  more 
useful  a  method  of  generating  new  heuristics  than  would  specialization  or  generalization.  By  noting 
that  both  Utility  and  Task  have  several  dimensions,  most  of  this  problem  went  away.  By  noting 
dial  two  heuristics  can  have  many  important  relations  connecting  them,  of  which  More-General- 


Mum  is  just  one  example,  the  shallowness  problem  turns  into  a  powerful  heuristic:  if  a  new  heuristic 
It  is  to  differ  from  an  old  one  along  some  dimension  (relation)  r,  then  use  analogy  to  get  //  if  rs 
graph  is  shallow,  and  use  gcncralization/spccinlization  if  rs  graph  is  deep.  We  also  discussed  some 
useful  slots  which  heuristics  can  have,  and  a  method  for  generating  new  kinds  of  slots. 

Before  die  research  programme  outlined  in  figure  2  can  be  completed,  much  more  must  be  known 
about  analogy,  and  more  complete  theories  of  heuristics  and  of  representation  must  exist.  Toward 
that  goal  we  must  obtain  more  empirical  rcsulLs  from  programs  trying  to  find  useful  new  domain- 
specific  heuristics  and  representations. 
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